POPULARITY
Recorded live at Data Center World 2026, Data Center Frontier Editor in Chief Matt Vincent sits down with Phillip Koblence, COO of NYI and co-founder of Nomad Futurist, for the latest installment of Nomads at the Frontier. The conversation explores the accelerating realities of AI infrastructure buildouts, the industry's growing focus on community engagement, workforce shortages, and the shift toward inference-driven deployments following NVIDIA GTC 2026. Koblence discusses why major interconnection hubs and edge-adjacent urban facilities may become increasingly important in the inference era, the operational realities of deploying AI infrastructure in legacy carrier hotels like 60 Hudson Street, and why the industry can no longer remain invisible to the communities where it builds. Additional topics include: The continuing surge in digital infrastructure demand Why conference attendance reflects sustained industry expansion Power constraints and energy storage discussions emerging at Data Center World AI factories and the evolving economic role of data centers Workforce shortages across engineering and skilled trades Nomad Futurist's workforce development initiatives with Infrastructure Masons and I Am The Armed Forces The growing complexity and diversity of the data center ecosystem “Every element of everything within the data center has a full sub-vertical industry associated with it,” Koblence says during the discussion. “People would be surprised how large of an ecosystem is involved in creating the digital economy that exists today.” Listen now for a candid, fast-moving conversation on the state of AI infrastructure and the future of digital infrastructure development.
Join KJ Burke, as he dives into the key highlights from NVIDIA's GTC 2026 conference. Discover the latest in AI advancements, robotics, space computing and enterprise strategies shaping the future of technology. To learn more, visit cdw.ca Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
The Pure Report welcomes Andrea Moccia, VP of AI and Data at Options Technology, and Robert Alvarez, AI Solution Architect at Everpure, to discuss the cutting edge of AI deployment right after the energy of NVIDIA GTC. We dive into sobering statistics that show a high failure rate for generative AI pilots—95% fail to scale to production—and discuss how the root cause is a fundamental data strategy problem. Our discussion shift to focus around the unique, high-stakes challenges faced by the financial services industry (FSI), which contends daily with massive data volume (tens of petabytes of market data), strict global compliance and regulatory requirements, and the need for near real-time, low-latency answers from AI models. Andrea explains how the power of simplicity is an operational advantage, following the mantra: "Simplicity is what lets you be brave.” He details how Options is addressing issues like data leakage and data sovereignty with their Private Mind offering—a private, sovereign AI platform where they control the entire stack, from model to metal" Robert and Andrea connect this innovation to the Everpure partnership, specifically how solutions like Data Stream and Everpure KVA (which Robert co-developed) are vital in reducing implementation complexity and accelerating real-world use cases, such as building a powerful knowledge graph on hundreds of thousands of SEC filings efficiently. Finally, we conclude with our Hot Takes segment to dispel common AI misconceptions. We talk about how companies should stop obsessively chasing the latest frontier models or GPUs for every task, as open-source alternatives and smaller, distilled models are perfectly capable for a majority of use cases. In conclusion, hear how the true key to AI maturity and growth lies not in chasing technological hype, but in removing data silos, fixing the foundational data strategy, and using the rapidly maturing AI ecosystem to streamline business processes. To learn more, visit: https://www.purestorage.com/customers/options.html Check out the new Everpure digital customer community to join the conversation with peers and Pure experts: https://purecommunity.purestorage.com/ 00:00 Intro and Welcome 01:50 Recap of NVIDIA GTC 2:40 Overview on Options Technology 3:55 Andrea's Career Journey 7:14 Robert Alvarez Intro 9:13 Stat of the Episode on AI Pilots 12:20 AI Challenges for FSIs 15:05 Simplicity Let's You Be Brave 21:05 AI and KVA in Action at Options 23:45 Data Sovereignty and Compliance 30:35 Hot Takes Segment 35:37 Summary and a Look Forward
Lo que vimos en la última NVIDIA GTC no fue solo un lanzamiento tecnológico, fue una declaración de ambición. NVIDIA está construyendo las “fábricas” donde se va a producir la inteligencia artificial del futuro. Desde Blackwell hasta alianzas con gigantes como Microsoft, Amazon y Google, exploramos si esta apuesta es el inicio de una nueva era… o un riesgo que el mercado aún no dimensiona.
Send us Fan MailModern data platforms are evolving—and speed, scale, and efficiency are becoming non‑negotiable.In this episode of Exchanges with Hitachi Solutions, host Matt Volke sits down with Evan Sotos, Engineering Manager for the Empower Data Platform, fresh off his return from NVIDIA GTC. Together, they explore how GPU acceleration is moving beyond AI and machine learning—and into the core of data engineering.The conversation dives into what Evan heard from engineers, partners, and vendors at GTC, why NVIDIA is positioning itself as an algorithms company, and how technologies like NVIDIA RAPIDS are being used to dramatically accelerate analytics and data pipelines without rewriting existing code. What You'll Learn· Why GPU acceleration is becoming a core capability for modern data platforms, not just AI workloads· What NVIDIA RAPIDS is and how it enables existing CPU‑based workloads to run on GPUs· How GPU acceleration can significantly reduce processing time and overall compute costs· Why “zero code changes” is such a critical advantage for real‑world data teams· Which types of data workloads benefit most from GPU‑accelerated pipelines From AI Buzz to Real‑World Data Engineering ImpactWhile NVIDIA GTC is often associated with AI and large language models, this conversation highlights a broader shift: GPUs are increasingly being applied to traditional data engineering and analytics workloads.Evan shares how NVIDIA RAPIDS acts as a mapping layer that allows existing Spark and Databricks workloads to take advantage of GPU compute. Rather than forcing teams to refactor complex, production‑grade code, GPU acceleration can be enabled through configuration—making it practical for teams to test, validate, and adopt without disruption. The result? Faster pipelines, improved cost efficiency, and a shorter path from raw data to actionable insight—especially for large, time‑sensitive workloads. What This Means for Data TeamsFor organizations running large‑scale analytics, predictive models, or operational reporting, time truly is money. Evan explains how accelerating data pipelines can directly impact downstream use cases—from predictive maintenance to real‑time decision‑making—by reducing the lag between data ingestion and insight.Most importantly, this episode emphasizes practicality: GPU acceleration isn't about chasing hype. It's about giving data teams another tool they can turn on, test, and adopt when it makes sense—without introducing risk, rework, or operational complexity. global.hitachi-solutions.com
Picking a use case, proving value, and expanding has been the standard starting point for enterprise AI. For organizations early in their AI journey, that advice still holds. But for large enterprises that are past the pilot stage and trying to scale across business units, geographies, and brands, it isn't enough.At NVIDIA GTC, Cameron Davies, Chief Data Officer of Yum Brands, shared how his team is thinking about AI differently — and why they had to. With 63,000 restaurant locations, 100 million daily transactions, and 1,500 franchisees across 155 countries, Yum operates at a scale where a single bad AI decision can fail loudly, repeatedly, and fast.In this episode, Maribel breaks down Davies' framework and what it means for how enterprise leaders should be thinking about AI in 2026 and beyond.---**What you'll learn**- Why the use case as a unit of AI planning has a structural limitation at enterprise scale- What "scalable AI skills" means and why it's different from building agents for specific use cases- Why governance has to come before deployment, not after — and what happens when it doesn't- How measurement functions as operational discipline, not just a reporting obligation- What Yum's AI flywheel looks like and why it only works if measurement is continuous- What this framework means for organizations that aren't Yum-sizedAbout Cameron DaviesCameron Davies is the Chief Data Officer at Yum Brands, the parent company of KFC, Taco Bell, Pizza Hut, and The Habit Burger Grill. He leads the company's corporate data and analytics strategy and oversees the development and adoption of advanced data capabilities. He previously spent seven years as SVP at NBCUniversal and over 18 years at The Walt Disney Company, where he led the Corporate Center of Excellence for AI and machine learning.---**Resources and references mentioned**-NVIDIA GTC session: "Scaling AI Agents Globally Across Brands, Use Cases, and Restaurants" (S81755) — Cameron Davies, Yum Brands- Responsible AI Institute — chaired by Manoj Saxena- Trustwise — AI trust startup founded by Manoj Saxena- Byte — Yum Brands' proprietary e-commerce, point-of-sale, and menu platform- Lopez Research blog: The Rules for Scaling AI Have Changed. Yum Brands Proved It. — [LINK]---
Physical AI is arriving on factory floors ahead of schedule, and Vention is already deploying it on applications four automation integrators failed to crack.François Giguère, CTO of Vention, draws a precise line between agentic AI and physical AI. Agentic systems process data and return data. Physical AI controls motion and actuation that produce real world consequences on a factory floor where a hundred percent uptime is the only acceptable standard. Giguère has spent a decade helping build Vention, a platform that lets manufacturers design robotic cells in 3D, program them through natural language, simulate them in a browser, and receive the physical machine shipped in modular components like an industrial kit. With a team of 95 engineers and three years as CTO, he brings a grounded perspective on where AI delivers real value in industrial automation and where it still falls short.The design, automate, simulate workflow at Vention represents one of the most complete implementations of AI-powered machine engineering currently in production. In the design phase, customers build systems from a modular component library. In the automate phase, an AI agent converts natural language prompts into Python control code for the entire cell including robot arms, conveyors, vision systems, and grippers. The program is validated in simulation before a single component ships. This is made possible by Vention's motion streaming architecture: instead of treating the robot as the master controller the way KUKA KRL does, Vention brings all motion planning, inverse kinematics, forward kinematics, blending, and trajectory optimization into its own software stack. The robot becomes a passive component consuming a motion stream, and the entire machine becomes programmable from a single unified codebase that AI tools excel at generating. Giguère notes that Vention's choice to use Python as the programming language for automation control gives their AI tools a measurable edge over environments built on structured text or ladder logic.Vention's two physical AI products are GRIP (Generalized Robotics Intelligence Pipeline) and Rapid AI Operator, a modular bin picking application built on top of GRIP. The technology relies on transformer-based foundation models.About François GiguèreFrançois Giguère is the CTO of Vention, an industrial automation platform where manufacturers design, program, simulate, and deploy robotic systems entirely online. Employee number four at the company, he has contributed to Vention's growth for over 10 years and leads a team of 95 engineers. He holds a background in electrical engineering and real-time embedded software development.Learn more: https://vention.ioTimestamps0:00 Introduction and welcome1:00 François Giguère's background and Vention overview2:20 How AI spans Vention's internal tools and customer products4:00 Why embedded and robotics code is harder for AI to generate7:00 Design, automate, simulate: Vention's three-stage AI workflow13:50 Motion streaming: one unified controller for all robot brands18:20 Defining physical AI versus agentic AI20:10 GRIP pipeline and Rapid AI Operator22:40 Case study: MacAlpine Plumbing bin picking with foundation models39:40 Nvidia GTC impressions: agentic AI eclipsing physical AI46:20 Edge versus cloud: why real-time inference stays on-prem56:10 Predictions: physical AI roadmap and the VLA timelineThis episode is sponsored by:MaintainX helps maintenance and operations teams work smarter by putting critical information directly in the hands of technicians. According to MaintainX, technicians spend up to 40 percent of their time searching for answers and responding to radio calls rather than fixing assets.https://www.maintainx.comAbout Your HostsVladimir Romanov is a co-host of The Manufacturing Hub Podcast and the founder of Joltek, an independent manufacturing and industrial automation consulting firm specializing in modernization strategy, digital transformation, and workforce development.Connect with Vlad: https://www.linkedin.com/in/vladromanov/Want to go deeper? Vlad and the team at Joltek have covered related topics here:Industrial Robotics: https://www.joltek.com/blog/industrial-roboticsEdge Computing and AI Value in Manufacturing Data: https://www.joltek.com/blog/edge-computing-ai-value-manufacturing-dataDave Griffith is a co-host of The Manufacturing Hub Podcast and founder of Capelin Solutions, an industrial automation firm helping manufacturers adopt smart manufacturing technology. He brings 15 years of experience in industrial automation and digital transformation.Connect with Dave: https://www.linkedin.com/in/davegriffith23/Subscribe to Manufacturing Hub: https://www.manufacturinghub.liveLinkedIn: https://www.linkedin.com/company/manufacturing-hub-networkYouTube: https://www.youtube.com/@ManufacturingHub
Thank you to Cisco for sponsoring my trip to the Cisco AI Lab in San Jose. In this deep dive into the future of data center networking, we sit down to explore the massive shifts happening in AI infrastructure. We discuss the rollout of new 100 terabit smart switches and firewalls powered by the Cisco Silicon One G300 chip, alongside the highly anticipated NVIDIA Spectrum 6. Discover the critical debate between Ethernet and InfiniBand for scaling AI clusters, the complexities of co-packaged optics (CPO) versus linear packaged optics (LPO), and how agentic AI and tools like Claude are revolutionizing legacy C code refactoring. From managing data center power constraints to enforcing security policies directly on DPUs, this conversation covers the hardware and software transformations you need to know to stay ahead in network engineering. // Will Eatherton SOCIAL // LinkedIn: / willeatherton Newsroom: https://newsroom.cisco.com/c/r/newsro... // YouTube video REFERENCE // • The 100Tbps AI Switch: Inside the Beast • Did Ethernet Just Win? Cisco's 100Tbps AI ... // David's SOCIAL // Discord: discord.com/invite/usKSyzb Twitter: www.twitter.com/davidbombal Instagram: www.instagram.com/davidbombal LinkedIn: www.linkedin.com/in/davidbombal Facebook: www.facebook.com/davidbombal.co TikTok: tiktok.com/@davidbombal YouTube: / @davidbombal Spotify: open.spotify.com/show/3f6k6gE... SoundCloud: / davidbombal Apple Podcast: podcasts.apple.com/us/podcast... // MY STUFF // https://www.amazon.com/shop/davidbombal // SPONSORS // Interested in sponsoring my videos? Reach out to my team here: sponsors@davidbombal.com // MENU // 0:00 - Coming Up 0:42 - Introduction 01:05 - Recap of Announcements from Cisco Live 03:19 - 1.6 Terabyts Client Optics 04:27 - Hyperscalers and Neo-Clouds 05:13 - Cisco and Nvidia working together 05:39 - Scale Across 06:43 - Announcements from Nvidia GTC 2026 09:15 - Firewalls and AI Clusters 10:36 - The Future, Growth and Innovation 11:53 - Why have a Cisco Switch and a Nvidia Switch? 14:33 - Operating Systems on the Switches 16:42 - Infiniband vs Ethernet in the Data Centre 17:52 - Other Announcements from GTC 19:35 - Concerns around Data Centres 21:22 - Agentic AI in Data Centres 22:44 - Evolution of Soltware in Data Centres 25:07 - The Future of Vibe Coding 29:13 - Updates In the Routing Circles 30:43 - Open Source AI 32:11 - A view into the Future 35:14 - Outro Please note that links listed may be affiliate links and provide me with a small percentage/kickback should you use them to purchase any of the items listed or recommended. Thank you for supporting me and this channel! Disclaimer: This video is for educational purposes only. #cisco #nvidia #agenticai
Geopolitical dislocations are ripping through the stock market and are filtering down to IT budgets in the form of increased uncertainty. It seems that every quarter of budget optimism is followed with some external event that causes organizations to tighten their belts. Specifically, we've seen the increased momentum in January CIO sentiment on spending, pull back as war, oil prices, the threat of inflation and even the prospect of Fed tightening now loom larger. While big tech players continue to spend massively on CAPEX, and the genuine enthusiasm from this month's Nvidia GTC and RSAC events is still being felt, mainstream enterprises are once again expressing caution in their spending intentions. In addition to economic and world affairs, AI success still eludes most mainstream organizations. Our observation is the tech industry is in the third inning of the AI wave, which started in earnest mid last decade with Deep Mind and other significant research milestones that led to the ChatGPT and subsequent moments like Claude Code and OpenClaw. Yet organizations are still in the first inning. The data suggests that while virtually all firms are leaning into AI, those realizing ROI at scale remain the minority. While leading thinkers like Jensen Huang advise not focusing on ROI and letting innovation flourish irrespective of hard dollar returns, the reality is in the land of enterprise customers, tangible returns and risk management remain key governors of spending.
Your construction back office admin hasn't taken a vacation in ten years. And it's your software's fault.In this episode of Bricks, Bucks & Bytes, Owen, Patric, Martin, and Dustin break down what four AI CEOs said at NVIDIA GTC and what it means for construction. Then Anna Berger joins fresh off raising $10M in three weeks to expose the chaos inside every specialty contractor's back office.AI token costs collapsed 99% — here's what that unlocksThe data center boom is creating a trades crisis — electricians are now the hottest hire in AmericaWhy AI will never run construction payrollHow Anna closed $10M with 40+ investor meetings in her first week"I just took my first vacation in ten years — thank you." That's the kind of message Anna Berger is getting from her customers.Watch the full episode now!Our Sponsors:Aphex is the multiplayer planning platform where construction teams plan together, stay aligned, and deliver projects faster – check out aphex.coArchdesk - “The #1 Construction Management Software for Growing Companies - Manage your projects from Tender to Handover” check archdesk.comBuildVision - streamlining the construction supply chain with a unified platform - www.buildvision.ioChapters00:00 Intro01:00 Introduction to NVIDIA GTC and AI CEOs 03:38 The Impact of AI on Cost and Accessibility 05:34 Specialization vs. Commoditization in AI Models 07:44 The Role of AI in Engineering and Construction 10:33 Deterministic Outcomes and Governance in AI 13:32 The Future of AI in Enterprises and Job Market Dynamics 23:59 The Role of Accuracy in Construction Projects 28:06 AI vs Human Judgment in Project Estimation 30:52 Evaluating AI Accuracy in Professional Contexts 33:45 The Future of Skilled Trades and Workforce Training 40:24 Economic Predictions and Market Interests 44:59 Quarterly Recap: Trends and Insights in Construction Tech 47:01 Real-Life Lessons from the Industry 51:43 Celebrating Success: Anna's Fundraising Journey 53:53 Understanding Trade: A Deep Dive into Construction Back Office Solutions 56:46 Future Plans: Scaling and Product Development 1:00:46 The Importance of Compliance in Payroll Management 1:03:28 Y Combinator's Role in Construction Tech
Our 238th episode with a summary and discussion of last week's big AI news!Recorded on 03/18/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* OpenAI released GPT-5.4 mini and nano with 400k-token context windows, higher per-token prices but claimed token-efficiency gains in Codex; nano is API-only and pitched for high-volume classification/data extraction despite a major price increase.* Mistral open-sourced the Small 4 model family (MoE, 119B total/6B active) combining reasoning, multimodal, and coding-agent capabilities, and announced Forge to help businesses train or post-train custom models.* Agent “operating system” competition intensified with Meta's acquired Manus launching a local Mac agent, Nvidia announcing NeMo/“Open Shell” sandboxed agent runtime, and Nvidia also unveiling DLSS 5 plus major hardware forecasts including Groq LPU integration.* Business and safety updates included OpenAI shifting focus toward productivity/enterprise amid competition, Microsoft reorganizing Copilot and frontier-model efforts, Meta delaying its next model, China-linked ByteDance deploying large Nvidia clusters abroad, and new safety work on steganography, chain-of-thought faithfulness, fine-tuning defenses, cyber-attack evals, and constitution/spec compliance.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:56) News PreviewTools & Apps(00:02:39) OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier(00:08:04) Mistral's new Small 4 model punches above its weight with 128 expert modules(00:14:03) Meta's Manus launches 'My Computer' to turn your Mac into an AI agent - 9to5Mac(00:17:57) NVIDIA Announces NemoClaw for the OpenClaw Community | NVIDIA Newsroom + Nvidia boosts knowledge work with Open Agent Development Platform(00:24:09) DLSS 5 looks like a real-time generative AI filter for video games | The Verge(00:26:36) OpenAI to Launch ChatGPT 'Adult Mode' Despite Warnings From Its Own Advisers - CNETApplications & Business(00:33:46) OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only(00:41:25) Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through '27(00:45:44) Mistral launches Forge to help enterprises build their own AI models(00:54:17) China's ByteDance gets access to top Nvidia AI chips, WSJ reports(00:57:57) Meta Delays Rollout of New A.I. Model After Performance Concerns(01:02:50) Microsoft Shakes Up AI Division As Copilot Falls Behind Google and OpenAIPolicy & Safety(01:07:26) A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring(01:13:09) Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought(01:18:29) In-Training Defenses against Emergent Misalignment in Language Models(01:23:07) How do frontier AI agents perform in multi-step cyber-attack scenarios?(01:25:20) Eval awareness in Claude Opus 4.6's BrowseComp performance(01:29:49) Introducing Bloom: an open source tool for automated behavioral evaluations(01:32:26) How well do models follow their constitutions?(01:37:11) Nvidia's H200 License Stirs Security Concern Among Top DemocratsResearch & Advancements(01:40:050) [2603.15031] Attention Residuals(01:47:11) Mamba-3: Improved Sequence Modeling using State Space PrinciplesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Send us Fan MailIn this episode of Sidecar Sync, co-hosts Amith Nagarajan and Mallory Mejias dive into the explosive rise of OpenClaw—an open-source AI agent that's taken the tech world by storm. From being dubbed “the next ChatGPT” at NVIDIA GTC to triggering widespread security concerns, OpenClaw represents both the promise and peril of the agent era. Amith breaks down what OpenClaw actually does, why its viral adoption matters more than its underlying tech, and how its vulnerabilities highlight the risks of giving AI broad system access. The conversation expands into the bigger picture: the shift from models to “harnesses,” why AI agents are already reshaping workflows, and what association leaders should do right now to stay ahead—without putting their organizations at risk.
The NIA boys discuss Nvidia GTC, AI Inference Explained, Jensen new Steve Jobs & Super Micro's $2.5B Smuggling SchemeTimestamps(00:00:00) - Intro(00:04:21) - Meme of the Week(00:05:37) - Jensen new Steve Jobs(00:09:29) - Nvidia GTC(00:13:46) - AI Agents and OpenClaw(00:22:11) - AI Inference Explained(00:29:31) - Super Micro's $2.5B Smuggling SchemeWhat Is Not Investment Advice?Every week, Jack Butcher, Bilal Zaidi & Trung Phan discuss what they're finding on the edges of the internet + the latest in business, technology and memes.Subscribe + listen on your fav podcast app:Apple: https://pod.link/notadvicepod.appleSpotify: https://pod.link/notadvicepod.spotifyOthers: https://pod.link/notadvicepodListen into our group chat on Telegram:https://t.me/notinvestmentadviceLet us know what you think on Twitter:http://twitter.com/bzaidihttp://twitter.com/trungtphanhttp://twitter.com/jackbutcherhttp://twitter.com/niapodcast Hosted on Acast. See acast.com/privacy for more information.
Está no ar, o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !!Aperte o play e ouça agora, o Data Hackers News dessa semana!Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal:https://www.datahackers.news/Conheça nossos comentaristas do Data Hackers News:Monique FemmeDemais canais do Data Hackers:SiteLinkedinInstagramTik TokYou Tube
With a spiraling conflict in the Middle East and ominous warning signs sounding from major players in the private credit industry, not even the exciting developments from AI juggernaut NVIDIA in its recent GTC conference could rise above the deluge of stories this month. Strap in as GK Managing Partners Ayal Shmilovich and Hatem Dhiab break down this whirlwind news cycle and explain why the US economy holds steady when the going gets tough.The opinions voiced in this material are for general information only and are not intended to provide specific advice or recommendations for any individual. You should consult a financial advisor before making any investment decisions.
Tony Foster attended NVIDIA GTC, Eric and Tony talk about what's happening with 7 new chip sets and 5 rack based solutions.
(0:00) Intro live from Nvidia GTC (0:37) CoreWeave CEO, Michael Intrator (32:58) Perplexity CEO, Aravind Srinivas (1:07:11) Mistral CEO, Arthur Mensch (1:18:57) IREN CEO, Daniel Roberts Our episode is sponsored by the New York Stock Exchange - a modern marketplace and exchange for building the future. It all happens at the NYSE - https://nyse.com Follow the besties: https://x.com/chamath https://x.com/Jason https://x.com/DavidSacks https://x.com/friedberg Follow on X: https://x.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@theallinpod Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://x.com/yung_spielburg Intro Video Credit: https://x.com/TheZachEffect
Ben Bajarin and Jay Goldberg discuss Nvidia GTC, AI infrastructure, optical networking, and the future of semiconductor technology. They analyze Nvidia's strategic positioning, product innovations, and industry trends shaping the AI and data center landscape. They also talk OFC/optical trends and Micron earnings.
Cory Johnson with Epistrophy Capital Research and Daniel Newman with Futurum recap the tech-filled week during Nvidia's (NVDA) GTC 2026 conference. Among the topics they discuss: how Nvidia will grow following the slew of tech innovations and partnerships, who will benefit from growing CapEx spending, and ways AI adoption shapes how businesses are built. ======== Schwab Network ========Empowering every investor and trader, every market day. Subscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribeDownload the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watchWatch on Vizio - https://www.vizio.com/en/watchfreeplus-exploreWatch on DistroTV - https://www.distro.tv/live/schwab-network/Follow us on X – https://twitter.com/schwabnetworkFollow us on Facebook – https://www.facebook.com/schwabnetworkFollow us on LinkedIn - https://www.linkedin.com/company/schwab-network/ About Schwab Network - https://schwabnetwork.com/about
This week on Upside, Dan Bowyer and Mads Jensen of SuperSeed and Lomax Ward of Outsized Ventures unpack a moment where AI infrastructure, enterprise adoption and market risk are all moving at once.Nvidia is laying out a path toward a $1 trillion AI market, driven by major advances in inference performance. At the same time, hyperscalers are investing at unprecedented levels — with AI capex increasingly supported by debt rather than free cash flow. But the real shift is happening higher up the stack.The AI race is moving away from pure model performance and toward distribution, enterprise control and monetisation.This episode explores:• Nvidia's roadmap and the scaling of AI infrastructure• Hyperscaler capex and the return of balance sheet risk• Why the AI battleground is shifting to enterprise• OpenAI's monetisation challenge and strategic positioning• The growing gap between AI capability and adoption• Where value actually accrues in the AI stack• And how hyperscalers are reshaping startup opportunitiesThis isn't just another AI cycle.It's infrastructure, capital and business models being rewritten at the same time.Chapters00:00 Intro02:00 Nvidia GTC and inference leap07:00 The trillion-dollar AI question12:00 Hyperscaler capex and leverage18:00 Models vs distribution23:00 OpenAI's strategy28:00 Enterprise AI battleground34:00 Market risk and concentration39:00 Capability vs adoption45:00 Where value accrues50:00 Startups vs hyperscalers55:00 Europe policy signalsUpside is a weekly deep dive into the forces shaping European venture, AI, defence and deep tech.Hosted by:Dan Bowyer (SuperSeed)Mads Jensen (SuperSeed)Lomax Ward (Outsized Ventures)About Upside
AI is becoming a scale and control business. On Episode 297 of The Six Five Pod, Patrick Moorhead and Daniel Newman examine the companies building the infrastructure, forming the alliances, and making the moves that will define who wins and who gets squeezed out. Control is shifting across compute, models, infrastructure, and enterprise distribution as NVIDIA, Microsoft, OpenAI, Meta, and others push to control the next phase of the AI market. The handpicked topics for this week are: NVIDIA's Full-Stack Push Gets Bigger: Following the GTC conference in San Jose, Pat and Dan break down how NVIDIA continues expanding beyond GPUs with Vera CPU, Dynamo, and a broader agentic AI stack designed to unify training, inference, orchestration, and enterprise-grade security. Microsoft, OpenAI, and Amazon Enter a New Phase of Tension: With Microsoft reportedly weighing legal action over OpenAI's growing AWS relationship, the discussion turns to exclusivity, multi-cloud strategy, and what happens when one of AI's most important alliances starts to crack. China, Compute, and the Geopolitics of AI Access: The hosts examine NVIDIA's reported H200 restart for China and what it says about export controls, policy pressure, and the global fight over advanced AI compute. Meta's $27B Infrastructure Agreement Signals the Real Race: Meta's latest infrastructure deal reinforces a central point of this episode, demand for AI capacity is still outrunning supply, and hyperscalers are moving aggressively to lock in long-term compute. OpenAI's Enterprise Push Raises Bigger Business Model Questions: As OpenAI leans harder into enterprise and eyes an eventual IPO, Pat and Dan unpack what this pivot says about monetization pressure, competitive positioning, and the need to prove a durable AI business model. The GPU Smuggling Story Shows How Valuable AI Hardware Has Become: A major smuggling case involving NVIDIA hardware spotlights the black market for AI chips and the growing intersection of compute, national security, and enforcement. The Flip: Did NVIDIA Just Change the Inference Market Again? This week's debate centers on whether NVIDIA's $20bn Groq Technology deal kills the standalone inference chip market, or whether it actually validates the market by proving just how strategically important specialized inference has become. The Fed, Micron, and Accenture Reflect a More Complicated Market: In Bulls and Bears, the hosts cover the Fed's latest decision, Micron's AI-driven momentum, and why Accenture's results still ran into skepticism despite strong execution. Meta's Workforce Cuts and AI Spend Reflect the New Corporate Tradeoff: The episode closes on the growing tension between rising AI investment and labor efficiency, as companies look for ways to fund massive infrastructure and token budgets while restructuring headcount. For a deeper dive into each topic, please click on the provided links. Subscribe to our YouTube Channel so you never miss an episode. The Decode NVIDIA GTC 2026: Vera Rubin Platform, Groq LPU Integration & $1T Demand Vision https://www.cnbc.com/2026/03/16/nvidia-gtc-2026-ceo-jensen-huang-keynote-blackwell-vera-rubin.html https://investor.nvidia.com/news/press-release-details/2026/NVIDIA-Vera-Rubin-Opens-Agentic-AI-Frontier/default.aspx https://x.com/PatrickMoorhead/status/2033662536227393952 https://x.com/danielnewmanUV/status/2033649511592284352 The Groq 3 LPU: NVIDIA's $20B Bet on Inference Economics https://www.cnbc.com/2026/03/13/a-closer-look-at-nvidias-20-billion-bet-on-tech-for-a-new-ai-chip.html https://www.tomshardware.com/tech-industry/semiconductors/nvidias-20-billion-groq-deal-produces-its-first-chip https://www.servethehome.com/decoding-the-future-of-inference-at-nvidia-groq-lpus-join-vera-rubin-platform-for-low-latency-inference/ https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-the-low-latency-inference-accelerator-for-the-nvidia-vera-rubin-platform/ https://www.jonpeddie.com/news/nvidias-groq-tie-in/ Microsoft Threatens to Sue OpenAI Over $50B Amazon AWS Frontier Deal https://www.reuters.com/technology/microsoft-weighs-legal-action-over-50-billion-amazon-openai-cloud-deal-ft-2026-03-18/ NVIDIA Restarting H200 Chip Production for China https://www.axios.com/2026/03/17/nvidia-huang-china-h200 https://x.com/danielnewmanUV/status/1999974968143257945 Meta & Nebius Sign $27B AI Infrastructure Agreement — Largest AI Compute Deal https://nebius.com/newsroom/nebius-signs-new-ai-infrastructure-agreement-with-meta https://x.com/danielnewmanUV/status/2033531056784347240 https://x.com/PatrickMoorhead/status/2033543939526193491 OpenAI Enterprise Pivot + Q4 2026 IPO Target https://www.reuters.com/business/openai-lays-groundwork-juggernaut-ipo-up-1-trillion-valuation-2025-10-29/ https://www.forbes.com/sites/josipamajic/2026/03/19/openais-pivot-to-enterprise-is-likely-a-race-against-anthropic-and-the-ipo-clock/ Supermicro's Legal Troubles https://fortune.com/2026/03/19/supermicro-arrested-founder-smuggling-gpu-china/ https://www.bbc.com/news/articles/cy41ly2d9wko The Flip: Did NVIDIA Just Kill the Inference Chip Startup Market with the Groq Acquisition? FOR: NVIDIA Killed It — The Inference Startup Market Is Over https://www.cnbc.com/2026/03/13/a-closer-look-at-nvidias-20-billion-bet-on-tech-for-a-new-ai-chip.html https://www.jonpeddie.com/news/nvidias-groq-tie-in/ AGAINST: Startups Survive — Hyperscalers Won't Deepen NVIDIA Dependency https://www.reuters.com/business/retail-consumer/cerebras-systems-amazon-strike-deal-offer-cerebras-ai-chips-amazons-cloud-2026-03-13/ https://www.tomshardware.com/pc-components/gpus/nvidia-removes-rubin-cpx-accelerators-from-its-roadmap-groq-3-lpus-take-center-stage-as-cpx-is-removed Bulls & Bears Market Reactions to Economic News https://uk.finance.yahoo.com/news/stock-market-today-dow-sinks-750-points-sp-500-nasdaq-slide-after-fed-decision-as-powell-touts-inflation-worries-200050703.html https://www.kiplinger.com/investing/live/march-fed-meeting-2026-live-updates-and-commentary https://www.investopedia.com/stock-market-today-dow-jones-s-and-p-500-03182026-11928689 $MU Micron Technology — Revenue Almost Triples, Tops Estimates https://www.cnbc.com/2026/03/18/micron-mu-q2-earnings-report-2026.html https://x.com/PatrickMoorhead/status/2034390648519024820 https://x.com/danielnewmanUV/status/2034378642613235921 $ACN Accenture — Q2 FY2026 Earnings Beat, Stock Drops ~5% on Guidance https://www.investing.com/news/earnings/accenture-falls-despite-q2-beat-as-earnings-guidance-disappoints-4570221 https://www.zacks.com/stock/news/2886706/accenture-earnings-beat-estimates-in-q2-revenues-increase-yy https://www.investing.com/news/transcripts/earnings-call-transcript-accenture-q2-2026-beats-forecasts-but-stock-dips-93CH-4570789 https://x.com/PatrickMoorhead/status/2033348794348142595
This podcast dives into the groundbreaking announcements from NVIDIA GTC 2026, marking the official dawn of the agentic and physical AI era. We explore CEO Jensen Huang's vision of transitioning from traditional data centers to "AI factories" that manufacture intelligence, converting power into "tokens" as the new economic unit. Key highlights include the staggering projection of $1 trillion in AI infrastructure demand and the unveiling of the vertically integrated Vera Rubin computing platform. Furthermore, we discuss the revolutionary OpenClaw, an open-source operating system likened to "Android for agents", and the leap into physical AI with Cosmos for robotics. Tune in to understand how NVIDIA is architecting the foundational infrastructure of the next digital economy. 本集 Podcast 帶您深入了解 2026 年 NVIDIA GTC 大會,見證「代理與實體 AI」時代的正式來臨。我們將解析黃仁勳的最新願景:傳統資料中心正轉型為製造智慧的「AI 工廠」,將電力轉化為「代幣(Tokens)」以作為新型經濟單位。 節目將探討高達一兆美元的 AI 基礎設施需求、全新的 Vera Rubin 運算平台,以及被譽為「代理界 Android」的開源作業系統 OpenClaw。歡迎收聽本集節目,了解 NVIDIA 如何建構未來數位經濟的核心基礎設施。 Powered by Firstory Hosting
SUMMARY: We dig into the NVIDIA GTC keynote and highlight three things - accelerated computing for everything, the complexity of the new inference stack, and NVIDIA's “open” software stack including NemoClaw.SHOW: 1012SHOW TRANSCRIPT: The Reasoning Show #1012 TranscriptSHOW VIDEO: https://youtu.be/aXOr91q76yMSHOW SPONSORS:VENTION - Ready for expert developers who actually deliver?Visit ventionteams.comSHOW NOTES:NVIDIA GTC 2026 (Keynote)NVIDIA NemoClaw - OpenClaw + OpenShell + NVIDIA Agent ToolkitNVIDIA adds Groq LPU to their rack systemsNVIDIA to invest $26B in Open Weight ModelsInterview with Jensen about Accelerated Computing (Stratechery)Topic 1 - Jensen's trying to paint the bigger picture of accelerated computing everywhere (robotics, autonomous driving, gen-ai, physical ai - but also just everyday enterprise apps). Everything is about keeping the stock price up, and margins high. The stock price provides the warchest to fight off all foes. Topic 2 - The inference architecture is a complex mix of GPUs, CPUs, ASICs/LPUs, high-speed networking and seems very different from the training architecture. How big is the burden on data center providers? What are the inference alternatives emerging? Topic 3 - Jensen talked a lot about OpenClaw and eventually about NVIDIA's NemoClaw. How does his interest in Agentic AI tie into his interest in building NVIDIA's own frontier modelFEEDBACK?Email: show @ reasoning dot showBluesky: @reasoningshow.bsky.socialTwitter/X: @ReasoningShowInstagram: @reasoningshowTikTok: @reasoningshow
Mates recap Nvidia GTC madness - Jensen's $1T revenue blitz fueling robots, robocabs, orbital fabs, and NemoClaw - while unpacking OpenClaw's GitHub supernova, Anthropic's enterprise crush on OpenAI, Elon's TerraFab TSMC-killer, and inference deflation exploding abundance. Get access to metatrends 10+ years before anyone else - https://qr.diamandis.com/metatrends Peter H. Diamandis, MD, is the Founder of XPRIZE, Singularity University, ZeroG, and A360 Salim Ismail is the founder of OpenExO Dave Blundin is the founder & GP of Link Ventures Dr. Alexander Wissner-Gross is a computer scientist and founder of Reified – My companies: Apply to Dave's and my new fund:https://qr.diamandis.com/linkventureslanding Go to Blitzy to book a free demo and start building today: https://qr.diamandis.com/blitzy Your body is incredibly good at hiding disease. Schedule a call with Fountain Life to add healthy decades to your life, and to learn more about their Memberships: https://www.fountainlife.com/peter _ Connect with Peter: X Instagram Connect with Dave: X LinkedIn Connect with Salim: X Join Salim's Workshop to build your ExO Connect with Alex Website LinkedIn X Email Substack Spotify Threads Listen to MOONSHOTS: Apple YouTube – *Recorded on March 19th, 2026 *The views expressed by me and all guests are personal opinions and do not constitute Financial, Medical, or Legal advice. Learn more about your ad choices. Visit megaphone.fm/adchoices
Anshel Sag and Mike Dano discuss highlights from Nvidia GTC 2026, focusing on Nvidia's “AI Grid” concept for distributed GPU compute across telecom and cloud-edge partners including AT&T, T-Mobile, Comcast, Spectrum, Cisco, and Akamai, positioned as a broader revival of edge computing beyond AI RAN. They cover Bell's plan to invest $1.7B in a Saskatchewan AI factory/data center emphasizing Canadian data sovereignty, with hardware funding from partners and expectations of improved EBITDA after buildout. Sag reviews OPPO's Find N6 foldable with a crease-free display, flagship specs, and fast charging, while noting Samsung discontinued its $3,000 Galaxy Z trifold due to economics. Dano analyzes the Google Fiber–Astound Broadband merger as part of a U.S. fiber land grab, and they close on U.S. government interest in 6G for the 2028 LA Olympics, WRC 2027 spectrum politics in Shanghai, and 3GPP work around 7 GHz reuse of C-band cell grids.
The Week in Tech is now a roundtable! Every Friday, Oz and three of the best writers covering Silicon Valley will discuss the latest news, decode emerging trends and debate what actually matters for the future of technology and for us. This week, guests Reed Albergotti (Semafor), Kyle Chayka (The New Yorker) and returning panelist Taylor Lorenz (User Mag) each share a story. Reed fills us in on what he saw at the Nvidia GTC conference in San Jose, and why we shouldn’t ignore OpenClaw. Taylor gives a primer on Section 230, the 30-year-old foundational internet law, and why there’s a campaign to repeal it. And finally, Kyle tells us what ‘taste’ means to Silicon Valley’s tech bros and why it may annoy you. Additional Reading: We’re all living inside Jensen Huang’s ‘triangle’ | Semafor How Powerful People Became Obsessed w/ Section 230 | User Mag Why Tech Bros Are Now Obsessed with Taste | The New Yorker See omnystudio.com/listener for privacy information.
This week, we discuss NVIDIA GTC, token machines, token budgets, and an AWS outage that may or may not involve AI. Plus, Matt reviews The Wizard of Oz at The Sphere. Watch the YouTube Live Recording of Episode 564 Runner-up Titles Let's T this up. One Trillion Dollars Leader to laggard My terms of service Someone should come up with a term Networking FOMO Slide crimes It's token machines all the way down. The Claude-ning So why Nemo Cloud? You're just selling token machines Billionaire version of Gallagher I've seen too much Upward Replicability Rundown NVIDA GTC Nvidia bets on AI inference as chip revenue opportunity hits $1 trillion Nvidia CEO Jensen Huang: $1 trillion in chip sales coming Keynote at NVIDIA GTC San Jose 2026 AI Roundup Amazon just called an emergency engineering meeting after AI coding tools caused multiple outages. 2 Ways to Correct the Financial Times at AWS (So Far) Amazon orders 90-day reset after code mishaps cause millions of lost orders Microsoft shakes up Copilot AI leadership team, freeing up Suleyman to build new models MCP vs. CLI for AI-native development - CircleCI Relevant to your Interests YouTube Lays Claim to Another Crown: The World's Largest Media Company The Lobster That Moved $50 Billion Amazon is determined to use AI for everything – even when it slows down work Google Fiber will be sold to private equity firm and merge with cable company How to Do Code Reviews in the Agentic Era Elon Musk Says He's Epically Screwed Up at xAI, Is Rebuilding "From the Foundations" The ‘AI-Washing' of Job Cuts Is Corrosive and Confusing Introducing Chainguard Commercial Builds: Secure-by-default containers Exclusive: Small publishers hit hardest by search traffic declines The State of AI in the Enterprise - 2026 AI report Agents Over Bubbles "Yes, AI Is a Bubble. There Is No Question." Listener Feedback Recap of SCALE 23x with Barton George Conferences KubeCon EU, March 23-26, 2026 - Coté will be there on a media pass. DevOpsdays Atlanta 2026, April 21-22, 2026 DevOpsDays Austin, May 5-6, 2026 WeAreDevelopers, July 8-10, 2026 Berlin, Coté speaking. VMware User Groups (VMUGs): Minneapolis (April 7-9, 2026) Toronto (May 12-14, 2026) Dallas (June 9-11, 2026) Orlando (October 20-22, 2026) SDT News & Community Join our Slack community Email the show: questions@softwaredefinedtalk.com Free stickers: Email your address to stickers@softwaredefinedtalk.com Follow us on social media: Twitter, Threads, Mastodon, LinkedIn, BlueSky Watch us on: Twitch, YouTube, Instagram, TikTok Book offer: Use code SDT for $20 off "Digital WTF" by Coté Sponsor the show Sponsor more podcasts with Failover Media Recommendations Brandon: Failover Media Newsletter Stitch - Design with AI Milestone iOS App Claude Command: /insights Matt: Caleb Sasser XOXO 2024 The McDonald's Centralia Mural MPC app Coté: Dream Router 7
Our guest this week is Boris Sofman, co-founder and CEO of Bedrock Robotics, who covers a wide range of topics related to the automation of heavy machinery. Boris discussed Bedrock's mission to develop autonomy technologies for construction equipment like excavators and bulldozers, aiming to make them fully operatorless. He shared insights from his experience at Waymo, highlighting parallels between autonomous vehicles and the automation of heavy machinery. The discussion also touched on the market opportunities in construction, the challenges of integrating AI with existing machinery, and the future of the construction industry with increased automation. Boris emphasized the importance of safety and the potential for automation to transform not just construction but also other industries such as agriculture and manufacturing. The conversation concluded with Boris outlining Bedrock's immediate goals, including moving from supervised autonomy to fully operatorless deployments. Learn more at: https://bedrockrobotics.com/ Cohost this week is The Robot Report's associate editor, Brianna Wessling. She recaps her trip this week to attend the NVIDIA GTC 2026 event in San Jose Ca. ### – SPONSOR – Download the 2026 State of the Robotics Industry Report: https://www.therobotreport.com/state-of-robotics-industry-report-2026/
Visit Mixture of Experts podcast page to get more AI content → https://www.ibm.com/think/podcasts/mixture-of-experts NVIDIA announces NemoClaw. This week on Mixture of Experts, host Tim Hwang is joined by Merve Unuvar, Martin Keen and Olivia Buzek—who is reporting live from NVIDIA GTC. Jensen Huang revealed $1 trillion in orders for Blackwell and Vera Rubin systems through 2027, plus the launch of NemoClaw—NVIDIA's enterprise-grade AI wrapper built on the OpenClaw agent platform. Next, Anthropic announces the Anthropic Institute, but can AI labs honestly audit their own technology while building it? Then, Shopify enters the agentic shopping arena with AI-powered personal shoppers that could reshape e-commerce. Finally, OpenAI increases focus on enteprise users and coding, but are they behind? 00:00 – Introduction 1:14 – NVIDIA GTC 2026: Trillion-dollar orders, NemoClaw & agentic computing 11:17 – Anthropic Institute: Can AI labs audit themselves? 22:12 – Shopify shopping agents & the future of e-commerce 35:15 – OpenAI's enterprise pivot: Coding & business focus The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity. Subscribe for AI updates → https://www.ibm.com/account/reg/us-en/signup?formid=news-urx-52120
Recorded live at NVIDIA GTC 2026 in San Jose, Corey sits down with returning guest Kari Briski—VP of Generative AI Software for Enterprise at NVIDIA—to unpack their biggest open-source model yet: Nemotron 3 Super. Kari breaks down why a 120B-parameter model runs as fast as a 12B one, how multi-agent systems are going from science fiction to production, and why Jensen Huang is calling this "a new operating system." We also dig into NVIDIA's work on Open Claw security, the 35x explosion in open-model token generation, and where omni-modal AI is heading next.Subscribe to The Neuron newsletter: https://theneuron.aiRelevant links:NVIDIA Build (try Nemotron): https://build.nvidia.comNemotron on Hugging Face: https://huggingface.co/nvidiaOpen Router: https://openrouter.aiKari's previous Neuron episode (Oct 2025): https://youtu.be/p0INn_w7TYo
Nvidia's GTC 2026 is finishing up and there is a lot of focus on AI inference. Remember that licensing deal with Groq? By pairing Groq 3 LPX compute with Vera Rubin systems, Nvidia is targeting a 10x revenue jump—moving from $30 billion with Blackwell to a staggering $300 billion opportunity in Ai inference. Jensen Huang's "five-layer cake" strategy now dominates the entire data center stack, from power delivery to AI models, aiming for $1 trillion in total revenue through 2027.This new heterogeneous architecture blends GPUs for high throughput with LPUs for ultra-low latency, creating near-instant AI interactions. But is Nvidia right-priced right now?Join us on Discord with Semiconductor Insider, sign up on our website: www.chipstockinvestor.com/membershipCheck out these other Nvidia videos:https://youtu.be/50UfALpisPghttps://youtu.be/_6w9EbjaSIIhttps://youtu.be/_uvIkPwDu5Ahttps://youtu.be/p5w0aPzDi3ISupercharge your analysis with AI! Get 15% of your membership with our special link here: https://fiscal.ai/csi/Sign Up For Our Newsletter: https://mailchi.mp/b1228c12f284/sign-up-landing-page-short-formIf you found this video useful, please make sure to like and subscribe!⏳ Chapters00:00 – Nvidia GTC 2026: The Groq Licensing Deal 01:00 – The "Five-Layer Cake" of AI Data Centers 01:55 – From Chip Designer to Supply Chain Giant 02:50 – Road to $1 Trillion: 2025–2027 Revenue Outlook 04:20 – Groq 3 LPX & Vera Rubin: The New Rack Solution 05:45 – GPU vs. LPU: Solving the Latency Problem 06:30 – Heterogeneous Architecture: Throughput & Interactivity 07:45 – The 10x Revenue Jump (Blackwell to Rubin) 08:20 – Stock Valuation: Is Nvidia Still a Buy? *********************************************************Affiliate links that are sprinkled in throughout this video. If something catches your eye and you decide to buy it, we might earn a little coffee money. Thanks for helping us (Kasey) fuel our caffeine addiction!Content in this video is for general information or entertainment only and is not specific or individual investment advice. Forecasts and information presented may not develop as predicted and there is no guarantee any strategies presented will be successful. All investing involves risk, and you could lose some or all of your principal. #Nvidia #GTC2026 #AIInference #VeraRubin #Groq #JensenHuang #StockMarket #Semiconductors #TechNews #DataCenterNick and Kasey own shares of Nvidia
Big Disney News Changes the Shape of the Show This week's episode of This Week at Walt Disney World came in with a full list of park updates, entertainment news, and fan conversation. Then the 2026 Disney shareholder meeting added a new layer to everything Sam and Greg were already discussing. It did not stop the show. Instead, it gave the episode more weight. With new CEO Josh D'Amaro sharing updates in real time, the conversation naturally shifted as Sam and Greg worked through what those company-level announcements could mean. Bluey Expands Across Disney Parks and Beyond Even with the shareholder meeting unfolding, Sam and Greg still covered the full slate of Disney park news. One of the biggest stories was Bluey's growing presence across Disney parks and platforms. Disneyland launches Bluey's Best Day Ever on March 22, complete with themed food, a popcorn bucket, a sipper, and added family appeal. Then on May 26, Bluey heads to Disney's Animal Kingdom as part of a much bigger summer rollout. Add in new Bluey minisodes on Disney+ and Bluey's Big Play, and Disney's strategy is clear. Bluey is becoming one of its biggest family-facing brands. Cool Kid Summer Starts May 26 That same date, May 26, also marks the launch of Cool Kid Summer. Sam and Greg walked through the lineup, including Mickey Mouse Clubhouse Live, The Magic of Disney Animation, Learn to Draw with Olaf, Off the Page, and Soarin' Across America. Taken together, those experiences point to a summer built around families, animation, and recognizable Disney characters. That makes Cool Kid Summer one of the most important seasonal pushes Disney has announced in a while. Buzz, Star Wars, and Summer Ticketing News The week also delivered several major planning updates for Walt Disney World guests. Buzz Lightyear's Space Ranger Spin reopens April 8 at Magic Kingdom, while The Mandalorian & Grogu arrive on Smugglers Run May 22. Disney also rolled out special discount tickets for kids, giving families another reason to start thinking about spring and summer trips now. Olaf, New Costumes, and More Across Disney Around the parks, cast members are debuting new costumes, while Minnie Mouse and Daisy Duck continue expanding Disney's relationship with F1 Academy. [caption id="attachment_80739" align="aligncenter" width="1200"] Minnie Mouse and Daisy Duck will be front and center of the Disney x Formula 1 ACADEMY collaboration this spring, appearing in exclusive merchandise, on-site character experiences and original content that brings their magic to fans old and new.[/caption] Then there was Olaf. During the week, Olaf appeared at NVIDIA GTC ahead of Disneyland Paris, giving Disney fans one more unexpected crossover moment. Outside the parks, Disney Cruise Line's Midnight Magic commercial premiered during the Oscars and quickly pulled at fans' emotions. Meanwhile, speculation continued around a possible Darkwing Duck reboot. Greg Lands a Guinness World Record Then came one of the most unexpected moments of the show. During the broadcast, Greg got official word that his name is in the Guinness Book of World Records. The record came in support of the smallest popcorn bucket ever, which instantly became one of the most memorable parts of the night. For a show that already tracks popcorn buckets, collectibles, and theme park oddities, the announcement felt perfectly on brand. Disney Madness Moves to Round 2 The episode also kept the fan bracket energy going as Disney Madness moved into Round 2. That added one more layer to a show already packed with park news, company updates, and collectible culture. All That and More with Sam & Greg Live From shareholder meeting updates and Bluey expansion to Cool Kid Summer, Olaf, Disney Madness, and Greg's Guinness World Record moment, this was one of the busiest live shows in recent weeks. Join Sam and Greg each week for This Week at Walt Disney World LIVE, where Disney news, live reactions, and fan conversation come together.
Join Downtown Josh Brown and Michael Batnick for another episode of What Are Your Thoughts and see what they have to say about: Nvidia GTC, the end of quarterly earnings reports, private credit panic, college grad unemployment heating up, Uber stock and much more! This episode is s sponsored by Public and Janus Henderson Investors. Find out more at https://public.com/WAYT Learn more at https://www.janushenderson.com/ Sign up for The Compound Newsletter and never miss out! Instagram: https://instagram.com/thecompoundnews Twitter: https://twitter.com/thecompoundnews LinkedIn: https://www.linkedin.com/company/the-compound-media/ TikTok: https://www.tiktok.com/@thecompoundnews Public Disclosure: Paid endorsement. Brokerage services provided by Open to the Public Investing Inc, member FINRA & SIPC. Investing involves risk. Not investment advice. Generated Assets is an interactive analysis tool by Public Advisors. Output is for informational purposes only and is not an investment recommendation or advice. See disclosures at public.com/disclosures/ga. Past performance does not guarantee future results, and investment values may rise or fall. See terms of match program at https://public.com/disclosures/matchprogram. Matched funds must remain in your account for at least 5 years. Match rate and other terms are subject to change at any time. Investing involves the risk of loss. This podcast is for informational purposes only and should not be or regarded as personalized investment advice or relied upon for investment decisions. Michael Batnick and Josh Brown are employees of Ritholtz Wealth Management and may maintain positions in the securities discussed in this video. All opinions expressed by them are solely their own opinion and do not reflect the opinion of Ritholtz Wealth Management. The Compound Media, Incorporated, an affiliate of Ritholtz Wealth Management, receives payment from various entities for advertisements in affiliated podcasts, blogs and emails. Inclusion of such advertisements does not constitute or imply endorsement, sponsorship or recommendation thereof, or any affiliation therewith, by the Content Creator or by Ritholtz Wealth Management or any of its employees. For additional advertisement disclaimers see here https://ritholtzwealth.com/advertising-disclaimers. Investments in securities involve the risk of loss. Any mention of a particular security and related performance data is not a recommendation to buy or sell that security. The information provided on this website (including any information that may be accessed through this website) is not directed at any investor or category of investors and is provided solely as general information. Obviously nothing on this channel should be considered as personalized financial advice or a solicitation to buy or sell any securities. See our disclosures here: https://ritholtzwealth.com/podcast-youtube-disclosures/ Learn more about your ad choices. Visit megaphone.fm/adchoices
As the Fed prepares for a rate decision tomorrow, few expect a move. Instead, focus might stay on crude after its retreat gave stocks a lift Monday. Nvidia's event also continues. Important Disclosures This material is intended for general informational purposes only. This should not be considered an individualized recommendation or personalized investment advice. The investment strategies mentioned may not be suitable for everyone. Each investor needs to review an investment strategy for his or her own particular situation before making any investment decisions. The Schwab Center for Financial Research is a division of Charles Schwab & Co., Inc. All names and market data shown above are for illustrative purposes only and are not a recommendation, offer to sell, or a solicitation of an offer to buy any security. Supporting documentation for any claims or statistical information is available upon request. Past performance is no guarantee of future results. Diversification and rebalancing strategies do not ensure a profit and do not protect against losses in declining markets. Indexes are unmanaged, do not incur management fees, costs, and expenses and cannot be invested in directly. For more information on indexes, please see schwab.com/indexdefinitions. The policy analysis provided by the Charles Schwab & Co., Inc., does not constitute and should not be interpreted as an endorsement of any political party. Fixed income securities are subject to increased loss of principal during periods of rising interest rates. Fixed income investments are subject to various other risks including changes in credit quality, market valuations, liquidity, prepayments, early redemption, corporate events, tax ramifications, and other factors. All expressions of opinion are subject to change without notice in reaction to shifting market, economic or political conditions. Data contained herein from third party providers is obtained from what are considered reliable sources. However, its accuracy, completeness or reliability cannot be guaranteed. Investing involves risk, including loss of principal, and for some products and strategies, loss of more than your initial investment. Digital currencies [such as bitcoin] are highly volatile and not backed by any central bank or government. Digital currencies lack many of the regulations and consumer protections that legal-tender currencies and regulated securities have. Due to the high level of risk, investors should view digital currencies as a purely speculative instrument. Cryptocurrency-related products carry a substantial level of risk and are not suitable for all investors. Investments in cryptocurrencies are relatively new, highly speculative, and may be subject to extreme price volatility, illiquidity, and increased risk of loss, including your entire investment in the fund. Spot markets on which cryptocurrencies trade are relatively new and largely unregulated, and therefore, may be more exposed to fraud and security breaches than established, regulated exchanges for other financial assets or instruments. Some cryptocurrency-related products use futures contracts to attempt to duplicate the performance of an investment in cryptocurrency, which may result in unpredictable pricing, higher transaction costs, and performance that fails to track the price of the reference cryptocurrency as intended. Please read more about risks of trading cryptocurrency futures here. The Schwab Center for Financial Research is a division of Charles Schwab & Co., Inc. Apple Podcasts and the Apple logo are trademarks of Apple Inc., registered in the U.S. and other countries. Google Podcasts and the Google Podcasts logo are trademarks of Google LLC. Spotify and the Spotify logo are registered trademarks of Spotify AB. (0131-0326) Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
This week: OpenAI's Pentagon deal sparked the #QuitGPT movement with 2.5 million supporters, Anthropic got labeled a supply-chain risk by the DOD, AI-driven layoffs hit Oracle and Block hard, NVIDIA teased its biggest GTC yet, and Apple revealed a $599 AI laptop.Key Topics CoveredOpenAI's classified Pentagon deal sparks #QuitGPT revolt with 2.5M supporters and 295% surge in ChatGPT uninstallsPentagon labels Anthropic a supply-chain risk; OpenAI and Google employees rally behind Anthropic in courtOracle eyes 30,000 layoffs and Block cuts 40% of workforce as AI replaces jobs at scaleNVIDIA GTC 2026 preview: $26B open-source investment, new inference chip, and enterprise AI platform expectedApple announces rebuilt Siri with Google Gemini and the $599 MacBook Neo AI laptopEpisode Timestamps00:00 — OpenAI's Pentagon Deal and the #QuitGPT Revolt01:00 — Pentagon vs. Anthropic: The Supply-Chain Risk Showdown02:00 — AI Layoffs Hit Oracle, Block, and Atlassian03:00 — NVIDIA GTC 2026: The Super Bowl of AI04:00 — Apple's Mass-Market AI PlayAbout The AI WhyThe AI Why with Liam Lawson covers enterprise AI — how it's being implemented at scale, and why the people building it do what they do. New episodes every Tuesday (weekly news in 5 minutes) and Thursday (hour-long interviews with founders and C-suite execs).Our LinksFree Newsletter — https://newsletter.theaireport.ai/subscribeWebsite — https://www.theaireport.aiLiam's LinkedIn — https://www.linkedin.com/in/not-the-f1-driver-liam-lawson/Book Enterprise Training — https://www.upscaile.com/
Futurum Group's Nick Patience and Hydra Host's Aaron Ginn talk with TITV Host Akash Pasricha about Nvidia's $1 trillion revenue projection and the new Groq-based chip system. We also talk with Reporter Sri Muppidi about OpenAI's new AWS deal for government contracts and Editor Ken Brown about Mastercard's $1.8 billion acquisition of BVNK. Lastly, we get into Asana's AI agent strategy and the "SaaS apocalypse" with CEO Dan Rogers.Articles discussed on this episode: https://www.theinformation.com/articles/openai-clinches-aws-deal-bid-win-government-contractshttps://www.theinformation.com/newsletters/ai-agenda/nvidia-needed-groqhttps://www.theinformation.com/briefings/mastercard-buy-stablecoin-startup-bvnk-1-8-billionSubscribe: YouTube: https://www.youtube.com/@theinformation The Information: https://www.theinformation.com/subscribe_hSign up for the AI Agenda newsletter: https://www.theinformation.com/features/ai-agendaTITV airs weekdays on YouTube, X and LinkedIn at 10AM PT / 1PM ET. Or check us out wherever you get your podcasts.Follow us:X: https://x.com/theinformationIG: https://www.instagram.com/theinformation/TikTok: https://www.tiktok.com/@titv.theinformationLinkedIn: https://www.linkedin.com/company/theinformation/
Der Iran-Krieg bedroht über die Sperrung der Straße von Hormuz globale Lieferketten: Öl, Helium für die Chipproduktion, Düngemittel und Memory-Chips aus Südkorea sind betroffen. Nvidia präsentiert auf der GTC 2026 die Vera-Rubin-Generation und plant $1 Billion Umsatz bis Ende 2027. Zusätzlich investiert Nvidia $26 Mrd. in Open-Weight-KI-Modelle. OpenAI streicht Nebenprojekte und fokussiert auf B2B, um Anthropics Aufholjagd ($19 Mrd. ARR) zu begegnen. OpenAIs Adult-Mode-Pläne beunruhigen das eigene Advisory Board. Sowohl OpenAI als auch Anthropic werben um Private-Equity-Firmen für KI-Beratungs-Ventures. Tesla plant eine Terafab für eigene Chip-Produktion. Meta entlässt bis zu 16.000 Mitarbeiter, um KI-Investitionen zu finanzieren. Musks xAI wird erneut umstrukturiert – "nicht richtig aufgesetzt". Digg entlässt alle Mitarbeiter und schließt die App. 59% der Hiring Manager geben zu, KI als Vorwand für Entlassungen zu nutzen. Ein Polymarket-Journalist erhält Todesdrohungen wegen Iran-Wetten. Big Tech startet eine Anti-Fraud-Allianz. Teenager verklagen XAI, weil Grok Nacktbilder von Minderjährigen erstellt hat. Unterstütze unseren Podcast und entdecke die Angebote unserer Werbepartner auf doppelgaenger.io/werbung. Vielen Dank! Philipp Glöckler und Philipp Klöckner sprechen heute über: (00:00:00) Iran-Krieg: Straße von Hormuz bedroht Lieferketten (00:20:35) Nvidia GTC 2026: $1 Billion Ziel und Vera Rubin (00:33:45) OpenAI fokussiert auf B2B, streicht Nebenprojekte (00:43:59) OpenAI Adult Mode: Berater warnen vor Erotik-Features (00:49:10) OpenAI und Anthropic werben um Private Equity (01:00:05) Tesla plant Terafab für eigene Chip-Produktion (01:04:13) Meta entlässt bis zu 16.000 Mitarbeiter (01:07:51) XAI-Umstrukturierung und Digg schließt App (01:10:05) AI-Washing (01:12:00) Polymarket: Todesdrohungen und Oscar-Insider-Trading (01:15:04) Big Tech startet Anti-Fraud-Allianz (01:17:16) Teenager verklagen XAI wegen Grok-Nacktbildern Shownotes Nvidia GTC 2026: $1 Billion Bestellungen für Blackwell und Vera Rubin - cnbc.com Nvidia investiert $26 Mrd. in Open-Weight-KI-Modelle - wired.com OpenAI streicht Nebenprojekte für Kerngeschäft - wsj.com OpenAIs Adult-Mode beunruhigt eigene Berater - wsj.com OpenAI sucht PE-Partner für Enterprise-KI-Venture - reuters.com Tesla-Aktie steigt dank Musks Terafab-Plänen - barrons.com Meta plant Massenentlassungen wegen KI-Kosten - reuters.com Musks xAI fängt erneut von vorne an - techcrunch.com Digg entlässt Mitarbeiter und schließt App - techcrunch.com AI-Washing: Stellenabbau als KI-Strategie verkauft - bloomberg.com Polymarket-Journalist erhält Todesdrohungen - spiegel.de Big Tech startet Allianz gegen Online-Betrug - axios.com Ex-Zalando-Chef Ritter wird Interims-CEO bei Kinnevik - manager-magazin.de Teenager verklagen Musk: Grok erstellte sexuelle Bilder - washingtonpost.com Meta investiert bis zu $27 Mrd. in Nebius KI-Infrastruktur - bloomberg.com
Carl Quintanilla, Jim Cramer and David Faber discussed stocks up sharply and WTI Crude pulling back from $100/barrel — on hopes efforts to reopen the Strait of Hormuz will bear fruit despite the ongoing Iran war. The anchors reacted to what Treasury Secretary Bessent told CNBC about such efforts. Meta in the spotlight: Facebook's parent reportedly plans to lay off 20% of its workforce. Separately, Nebius surged on its $27 billion AI pact with Meta. Also in focus: Cramer at Nvidia GTC ahead of CEO Jensen Huang's Monday keynote, memory chips extend rally, private credit roundup, FCC Chairman Carr's license threat, Conan O'Brien calls out Netflix Co-CEO Ted Sarandos and zings Amazon at the Oscars. Squawk on the Street Disclaimer Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Artificial Intelligence isn't coming… it's already here, and the pace of change is accelerating faster than most investors realize. In today's episode, we explore how AI is reshaping global markets, from technology and semiconductors to productivity, capital flows, and the next generation of innovation-driven companies. The implications go far beyond just tech stocks—AI is beginning to influence everything from corporate earnings to economic growth expectations. We'll also dive into highlights from NVIDIA GTC, where NVIDIA showcased the next wave of breakthroughs in artificial intelligence hardware and software. The conference made one thing clear: the AI arms race is accelerating, and the companies leading it could shape the future of the global economy. If you want to understand how this technological wave could influence markets, investment opportunities, and the pace of innovation, this episode connects the dots. Listen now:
@ProsperTradingAcademy's Charles Moon offers an example options trade for Nvidia (NVDA) as its GTC conference kicks off later today, also pointing out areas to watch on the Mag 7 giant's stock chart. Sam Vadas runs through what investors are watching for, including robotics, inference, and other A.I.-related announcements. ======== Schwab Network ========Empowering every investor and trader, every market day.Subscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribeDownload the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watchWatch on Vizio - https://www.vizio.com/en/watchfreeplus-exploreWatch on DistroTV - https://www.distro.tv/live/schwab-network/Follow us on X – / schwabnetwork Follow us on Facebook – / schwabnetwork Follow us on LinkedIn - / schwab-network About Schwab Network - https://schwabnetwork.com/about
The Information's Wayne Ma talks with TITV Host Akash Pasricha about Nvidia's GTC keynote and the company's new inference chip technology. We also talk with Khosla Ventures' Ethan Choi about the U.S.-China AI race and the rise of AI agents in super apps, AI Reporter Laura Bratton about why SaaS companies are quietly flagging AI as a major business risk in regulatory filings, and we get into industrial-scale data center hacks with Columnist Ann Davis Vaughan.Articles discussed on this episode: https://www.theinformation.com/newsletters/the-briefing/expect-gtc-nvidias-groq-chiphttps://www.theinformation.com/articles/figma-hubspot-ceos-say-fazed-risks-ai-agents-disclosures-say-otherwisehttps://www.theinformation.com/newsletters/ai-infrastructure/5-ingenious-hacks-boosting-ai-data-centersSubscribe: YouTube: https://www.youtube.com/@theinformation The Information: https://www.theinformation.com/subscribe_hSign up for the AI Agenda newsletter: https://www.theinformation.com/features/ai-agendaTITV airs weekdays on YouTube, X and LinkedIn at 10AM PT / 1PM ET. Or check us out wherever you get your podcasts.Follow us:X: https://x.com/theinformationIG: https://www.instagram.com/theinformation/TikTok: https://www.tiktok.com/@titv.theinformationLinkedIn: https://www.linkedin.com/company/theinformation/
Welcome to this audio special on Nvidia GTC 2026, the company's most anticipated event of the year. As the tech world gathers in San Jose, we tune in to CEO Jensen Huang's keynote to hear about the next frontier of AI. This year, the conversation shifts from model training to AI inferencing efficiency. We discuss Nvidia's strategic integration of Groq's "Language Processing Unit" (LPU) technology, which claims to run large language models up to 10 times more efficiently than traditional GPUs. Our episode also explores rumors of a dedicated inference chip and Nvidia's long-awaited laptop CPU, signaling a bold expansion beyond its GPU roots. With Nvidia's data center revenue reaching $193.5 billion, we analyze how these new platforms aim to solidify the company's dominance. Listen in as we break down the product launches and dealmaking strategies defining the future of computing. 歡迎收聽 Nvidia GTC 2026 年度盛事特別報導。本集節目將帶您深入瞭解聖荷西現場,解析執行長黃仁勳揭曉的 AI 全新藍圖。隨著產業重心從模型訓練轉向推理應用,我們將探討 Nvidia 如何整合 Groq 的語言處理單元(LPU)技術,追求比傳統 GPU 高出 10 倍的運算效率。 此外,本集將關注備受期待的 Nvidia 筆記型電腦 CPU 以及專用推理晶片的傳聞,這標誌著該公司正積極從 GPU 研發跨足更多元硬體領域。面對資料中心業務高達 1,935 億美元的營收規模,我們將分析 Nvidia 如何透過近期密集的併購策略,持續鞏固其全球運算霸主的地位。請隨我們一同聽取這場定義未來科技走向的關鍵解析。 Powered by Firstory Hosting
Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) Backlash against AI & specifically Sam Altman's comments about AI as a utility 2) Is this because people are worried about AI taking their jobs? 3) NBC poll shows AI is one of the least popular things in the U.S. 4) YouGov poll shows broadly negative feelings toward AI 5) Pew finds datacenters are very unpopular 6) Consequences of AI's unpopularity 7) Nvidia GTC preview: A rallying cry for AI 8) Could Jensen Huang be the guy that turns this around? 9) Amazon's AI code is messing things up 10) McKinsey's AI tool hacked 11) Meta can't get its act together with Avocado delayed 12) Should Meta's AI use Google's Gemini tech --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here's 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Learn more about your ad choices. Visit megaphone.fm/adchoices
In this episode, Chris Cochrane dives into Apple’s $599 MacBook Neo – the cheapest Mac laptop ever made – and whether it spells trouble for Chromebook makers. He also covers Samsung’s CEO blaming AI for rising phone prices, Framework raising RAM prices for the third time in three months, Meta unveiling four custom AI chips, NVIDIA’s GTC 2026 conference preview, a billion-dollar bet against large language models, Microsoft’s game-changing Project Helix Xbox with native Steam support, Windows 11’s new Xbox Mode, and SpaceX gearing up for a critical Starship Flight 12 test. – Want to start a podcast? Its easy to get started! Sign-up at Blubrry – Thinking of buying a Starlink? Use my link to support the show. Subscribe to the Newsletter. Email Chris if you want to get in touch! Like and Follow Geek News Central’s Facebook Page. Support my Show Sponsor: Best Godaddy Promo Codes Get 1Password Apple MacBook Neo The lead story covers Apple’s MacBook Neo. It launched at $599 and marks the cheapest Mac laptop ever made. The device runs on the A18 Pro chip from the iPhone 16 Pro. Cochrane notes a solid market for students, casual users, and anyone who needs a reliable home laptop. However, he advises photographers and videographers to invest in a MacBook Air or Pro instead. The real question remains whether this kills Chromebook sales in education. Samsung CEO Blames AI for Price Hikes Cochrane tackles Samsung’s Galaxy S26 price increases. CEO TM Roh blamed AI infrastructure demand for the hikes. Meanwhile, DDR4 DRAM prices surged sevenfold in a single year. Cochrane points out the irony. Samsung manufactures memory chips, shifted production toward AI data centers, and now cites that same shortage to justify higher consumer prices. He calls the situation “a little shady” but appreciates the transparency. Framework RAM Prices Up Again The RAM crisis extends beyond phones. Framework raised RAM prices for the third consecutive time in three months. Cochrane reinforces advice from a recent episode. He urges listeners to buy now before prices climb further. Analysts project peak prices by mid-2026. The shortage could last through late 2027. Sponsor: GoDaddy Economy hosting $6.99/month, WordPress hosting $12.99/month, domains $11.99. Website builder trial available. Use codes at geeknewscentral.com/godaddy to support the show. Meta Unveils Four Custom AI Chips Cochrane reports on Meta’s four new MTIA chip generations. The company aims to reduce its dependence on NVIDIA by building custom silicon. The MTIA 300 is already in production. New generations will ship every six months through 2027. The chips are built on open-source RISC-V architecture and manufactured by TSMC. NVIDIA GTC 2026 Preview NVIDIA’s GTC conference starts Monday in San Jose. Jensen Huang promises “chips the world has never seen.” Rumored architectures include Rubin Ultra and Feynman. The keynote streams free at nvidia.com on Monday at 11am Pacific. Cochrane notes that while companies like Meta are building chips to escape NVIDIA, competition will eventually catch up. Yann LeCun’s AMI Labs Raises $1.03 Billion Former Meta AI chief Yann LeCun raised $1.03 billion for AMI Labs at a $3.5 billion valuation. It marks the largest European seed round in history for a company just four months old. LeCun is building “world models” that learn from physical reality rather than text. Backers include Jeff Bezos, NVIDIA, and Samsung. Cochrane notes both approaches to AI can coexist. Microsoft Project Helix Microsoft revealed Project Helix at GDC 2026. For the first time, an Xbox will natively support Steam and GOG. Cochrane sees it as both desperate and inevitable. The only reason to buy from the Xbox store would be exclusives. He notes this is a breath of fresh air after months of talk that the Xbox era was ending. Dev kits ship in 2027 with a consumer launch likely late 2027 or 2028. Windows 11 Xbox Mode Microsoft is rolling out Xbox Mode to all Windows 11 PCs in April. The full-screen controller-optimized interface works with Steam, Epic, and Battle.net. Cochrane sees it as the first half of Microsoft’s two-phase gaming strategy. Xbox Mode trains users now. Project Helix delivers dedicated hardware later. He asks whether Sony and Nintendo will follow in Xbox’s footsteps. SpaceX Starship Flight 12 SpaceX announced stacking complete for the next Super Heavy booster at Starbase. Flight 12 targets April and debuts V3 hardware with Raptor 3 engines. Orbital refueling remains the critical unknown for NASA’s Artemis III moon landing. SpaceX has a track record of delivering eventually, just never on Elon’s original timeline. The post Is the MacBook Neo a Chromebook Killer? #1860 appeared first on Geek News Central.
Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and World's Fair!The definitive AI Accelerator chip company has more than 10xed this AI Summer:And is now a $4.4 trillion megacorp… that is somehow still moving like a startup. We are blessed to have a unique relationship with our first ever NVIDIA guests: Kyle Kranen who gave a great inference keynote at the first World's Fair and is one of the leading architects of NVIDIA Dynamo (a Datacenter scale inference framework supporting SGLang, TRT-LLM, vLLM), and Nader Khalil, a friend of swyx from our days in Celo in The Arena, who has been drawing developers at GTC since before they were even a glimmer in the eye of NVIDIA:Nader discusses how NVIDIA Brev has drastically reduced the barriers to entry for developers to get a top of the line GPU up and running, and Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. We also dive into Jensen's “SOL” (Speed of Light) first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (https://build.nvidia.com), and upcoming Dynamo and agent sessions at GTC.Full Video pod on YouTubeTimestamps00:00 Agent Security Basics00:39 Podcast Welcome and Guests07:19 Acquisition and DevEx Shift13:48 SOL Culture and Dynamo Setup27:38 Why Scale Out Wins29:02 Scale Up Limits Explained30:24 From Laptop to Multi Node33:07 Cost Quality Latency Tradeoffs38:42 Disaggregation Prefill vs Decode41:05 Kubernetes Scaling with Grove43:20 Context Length and Co Design57:34 Security Meets Agents58:01 Agent Permissions Model59:10 Build Nvidia Inference Gateway01:01:52 Hackathons And Autonomy Dreams01:10:26 Local GPUs And Scaling Inference01:15:31 Long Running Agents And SF ReflectionsTranscriptAgent Security BasicsNader: Agents can do three things. They can access your files, they can access the internet, and then now they can write custom code and execute it. You literally only let an agent do two of those three things. If you can access your files and you can write custom code, you don't want internet access because that's one to see full vulnerability, right?If you have access to internet and your file system, you should know the full scope of what that agent's capable of doing. Otherwise, now we can get injected or something that can happen. And so that's a lot of what we've been thinking about is like, you know, how do we both enable this because it's clearly the future.But then also, you know, what, what are these enforcement points that we can start to like protect?swyx: All right.Podcast Welcome and Guestsswyx: Welcome to the Lean Space podcast in the Chromo studio. Welcome to all the guests here. Uh, we are back with our guest host Viu. Welcome. Good to have you back. And our friends, uh, Netter and Kyle from Nvidia. Welcome.Kyle: Yeah, thanks for having us.swyx: Yeah, thank you. Actually, I don't even know your titles.Uh, I know you're like architect something of Dynamo.Kyle: Yeah. I, I'm one of the engineering leaders [00:01:00] and a architects of Dynamo.swyx: And you're director of something and developers, developer tech.Nader: Yeah.swyx: You're the developers, developers, developers guy at nvidia,Nader: open source agent marketing, brev,swyx: and likeNader: Devrel tools and stuff.swyx: Yeah. BeenNader: the focus.swyx: And we're, we're kind of recording this ahead of Nvidia, GTC, which is coming to town, uh, again, uh, or taking over town, uh, which, uh, which we'll all be at. Um, and we'll talk a little bit about your sessions and stuff. Yeah.Nader: We're super excited for it.GTC Booth Stunt Storiesswyx: One of my favorite memories for Nader, like you always do like marketing stunts and like while you were at Rev, you like had this surfboard that you like, went down to GTC with and like, NA Nvidia apparently, like did so much that they bought you.Like what, what was that like? What was that?Nader: Yeah. Yeah, we, we, um. Our logo was a chaka. We, we, uh, we were always just kind of like trying to keep true to who we were. I think, you know, some stuff, startups, you're like trying to pretend that you're a bigger, more mature company than you are. And it was actually Evan Conrad from SF Compute who was just like, you guys are like previousswyx: guest.Yeah.Nader: Amazing. Oh, really? Amazing. Yeah. He was just like, guys, you're two dudes in the room. Why are you [00:02:00] pretending that you're not? Uh, and so then we were like, okay, let's make the logo a shaka. We brought surfboards to our booth to GTC and the energy was great. Yeah. Some palm trees too. They,Kyle: they actually poked out over like the, the walls so you could, you could see the bread booth.Oh, that's so funny. AndNader: no one else,Kyle: just from very far away.Nader: Oh, so you remember it backKyle: then? Yeah I remember it pre-acquisition. I was like, oh, those guys look cool,Nader: dude. That makes sense. ‘cause uh, we, so we signed up really last minute, and so we had the last booth. It was all the way in the corner. And so I was, I was worried that no one was gonna come.So that's why we had like the palm trees. We really came in with the surfboards. We even had one of our investors bring her dog and then she was just like walking the dog around to try to like, bring energy towards our booth. Yeah.swyx: Steph.Kyle: Yeah. Yeah, she's the best,swyx: you know, as a conference organizer, I love that.Right? Like, it's like everyone who sponsors a conference comes, does their booth. They're like, we are changing the future of ai or something, some generic b******t and like, no, like actually try to stand out, make it fun, right? And people still remember it after three years.Nader: Yeah. Yeah. You know what's so funny?I'll, I'll send, I'll give you this clip if you wanna, if you wanna add it [00:03:00] in, but, uh, my wife was at the time fiance, she was in medical school and she came to help us. ‘cause it was like a big moment for us. And so we, we bought this cricket, it's like a vinyl, like a vinyl, uh, printer. ‘cause like, how else are we gonna label the surfboard?So, we got a surfboard, luckily was able to purchase that on the company card. We got a cricket and it was just like fine tuning for enterprises or something like that, that we put on the. On the surfboard and it's 1:00 AM the day before we go to GTC. She's helping me put these like vinyl stickers on.And she goes, you son of, she's like, if you pull this off, you son of a b***h. And so, uh, right. Pretty much after the acquisition, I stitched that with the mag music acquisition. I sent it to our family group chat. Ohswyx: Yeah. No, well, she, she made a good choice there. Was that like basically the origin story for Launchable is that we, it was, and maybe we should explain what Brev is andNader: Yeah.Yeah. Uh, I mean, brev is just, it's a developer tool that makes it really easy to get a GPU. So we connect a bunch of different GPU sources. So the basics of it is like, how quickly can we SSH you into a G, into a GPU and whenever we would talk to users, they wanted A GPU. They wanted an A 100. And if you go to like any cloud [00:04:00] provisioning page, usually it's like three pages of forms or in the forms somewhere there's a dropdown.And in the dropdown there's some weird code that you know to translate to an A 100. And I remember just thinking like. Every time someone says they want an A 100, like the piece of text that they're telling me that they want is like, stuffed away in the corner. Yeah. And so we were like, what if the biggest piece of text was what the user's asking for?And so when you go to Brev, it's just big GPU chips with the type that you want withswyx: beautiful animations that you worked on pre, like pre you can, like, now you can just prompt it. But back in the day. Yeah. Yeah. Those were handcraft, handcrafted artisanal code.Nader: Yeah. I was actually really proud of that because, uh, it was an, i I made it in Figma.Yeah. And then I found, I was like really struggling to figure out how to turn it from like Figma to react. So what it actually is, is just an SVG and I, I have all the styles and so when you change the chip, whether it's like active or not it changes the SVG code and that somehow like renders like, looks like it's animating, but it, we just had the transition slow, but it's just like the, a JavaScript function to change the like underlying SVG.Yeah. And that was how I ended up like figuring out how to move it from from Figma. But yeah, that's Art Artisan. [00:05:00]Kyle: Speaking of marketing stunts though, he actually used those SVGs. Or kind of use those SVGs to make these cards.Nader: Oh yeah. LikeKyle: a GPU gift card Yes. That he handed out everywhere. That was actually my first impression of thatNader: one.Yeah,swyx: yeah, yeah.Nader: Yeah.swyx: I think I still have one of them.Nader: They look great.Kyle: Yeah.Nader: I have a ton of them still actually in our garage, which just, they don't have labels. We should honestly like bring, bring them back. But, um, I found this old printing press here, actually just around the corner on Ven ness. And it's a third generation San Francisco shop.And so I come in an excited startup founder trying to like, and they just have this crazy old machinery and I'm in awe. ‘cause the the whole building is so physical. Like you're seeing these machines, they have like pedals to like move these saws and whatever. I don't know what this machinery is, but I saw all three generations.Like there's like the grandpa, the father and the son, and the son was like, around my age. Well,swyx: it's like a holy, holy trinity.Nader: It's funny because we, so I just took the same SVG and we just like printed it and it's foil printing, so they make a a, a mold. That's like an inverse of like the A 100 and then they put the foil on it [00:06:00] and then they press it into the paper.And I remember once we got them, he was like, Hey, don't forget about us. You know, I guess like early Apple and Cisco's first business cards were all made there. And so he was like, yeah, we, we get like the startup businesses but then as they mature, they kind of go somewhere else. And so I actually, I think we were talking with marketing about like using them for some, we should go back and make some cards.swyx: Yeah, yeah, yeah. You know, I remember, you know, as a very, very small breadth investor, I was like, why are we spending time like, doing these like stunts for GPUs? Like, you know, I think like as a, you know, typical like cloud hard hardware person, you go into an AWS you pick like T five X xl, whatever, and it's just like from a list and you look at the specs like, why animate this GP?And, and I, I do think like it just shows the level of care that goes throughout birth and Yeah. And now, and also the, and,Nader: and Nvidia. I think that's what the, the thing that struck me most when we first came in was like the amount of passion that everyone has. Like, I think, um, you know, you talk to, you talk to Kyle, you talk to, like, every VP that I've met at Nvidia goes so close to the metal.Like, I remember it was almost a year ago, and like my VP asked me, he's like, Hey, [00:07:00] what's cursor? And like, are you using it? And if so, why? Surprised at this, and he downloaded Cursor and he was asking me to help him like, use it. And I thought that was, uh, or like, just show him what he, you know, why we were using it.And so, the amount of care that I think everyone has and the passion, appreciate, passion and appreciation for the moment. Right. This is a very unique time. So it's really cool to see everyone really like, uh, appreciate that.swyx: Yeah.Acquisition and DevEx Shiftswyx: One thing I wanted to do before we move over to sort of like research topics and, uh, the, the stuff that Kyle's working on is just tell the story of the acquisition, right?Like, not many people have been, been through an acquisition with Nvidia. What's it like? Uh, what, yeah, just anything you'd like to say.Nader: It's a crazy experience. I think, uh, you know, we were the thing that was the most exciting for us was. Our goal was just to make it easier for developers.We wanted to find access to GPUs, make it easier to do that. And then all, oh, actually your question about launchable. So launchable was just make one click exper, like one click deploys for any software on top of the GPU. Mm-hmm. And so what we really liked about Nvidia was that it felt like we just got a lot more resources to do all of that.I think, uh, you [00:08:00] know, NVIDIA's goal is to make things as easy for developers as possible. So there was a really nice like synergy there. I think that, you know, when it comes to like an acquisition, I think the amount that the soul of the products align, I think is gonna be. Is going speak to the success of the acquisition.Yeah. And so it in many ways feels like we're home. This is a really great outcome for us. Like we you know, I love brev.nvidia.com. Like you should, you should use it's, it's theKyle: front page for GPUs.Nader: Yeah. Yeah. If you want GP views,Kyle: you go there, getswyx: it there, and it's like internally is growing very quickly.I, I don't remember You said some stats there.Nader: Yeah, yeah, yeah. It's, uh, I, I wish I had the exact numbers, but like internally, externally, it's been growing really quickly. We've been working with a bunch of partners with a bunch of different customers and ISVs, if you have a solution that you want someone that runs on the GPU and you want people to use it quickly, we can bundle it up, uh, in a launchable and make it a one click run.If you're doing things and you want just like a sandbox or something to run on, right. Like open claw. Huge moment. Super exciting. Our, uh, and we'll talk into it more, but. You know, internally, people wanna run this, and you, we know we have to be really careful from the security implications. Do we let this run on the corporate network?Security's guidance was, Hey, [00:09:00] run this on breath, it's in, you know, it's, it's, it's a vm, it's sitting in the cloud, it's off the corporate network. It's isolated. And so that's been our stance internally and externally about how to even run something like open call while we figure out how to run these things securely.But yeah,swyx: I think there's also like, you almost like we're the right team at the right time when Nvidia is starting to invest a lot more in developer experience or whatever you call it. Yeah. Uh, UX or I don't know what you call it, like software. Like obviously NVIDIA is always invested in software, but like, there's like, this is like a different audience.Yeah. It's aNader: widerKyle: developer base.swyx: Yeah. Right.Nader: Yeah. Yeah. You know, it's funny, it's like, it's not, uh,swyx: so like, what, what is it called internally? What, what is this that people should be aware that is going on there?Nader: Uh, what, like developer experienceswyx: or, yeah, yeah. Is it's called just developer experience or is there like a broader strategy hereNader: in Nvidia?Um, Nvidia always wants to make a good developer experience. The thing is and a lot of the technology is just really complicated. Like, it's not, it's uh, you know, I think, um. The thing that's been really growing or the AI's growing is having a huge moment, not [00:10:00] because like, let's say data scientists in 2018, were quiet then and are much louder now.The pie is com, right? There's a whole bunch of new audiences. My mom's wondering what she's doing. My sister's learned, like taught herself how to code. Like the, um, you know, I, I actually think just generally AI's a big equalizer and you're seeing a more like technologically literate society, I guess.Like everyone's, everyone's learning how to code. Uh, there isn't really an excuse for that. And so building a good UX means that you really understand who your end user is. And when your end user becomes such a wide, uh, variety of people, then you have to almost like reinvent the practice, right? Yeah. You haveKyle: to, and actually build more developer ux, right?Because the, there are tiers of developer base that were added. You know, the, the hackers that are building on top of open claw, right? For example, have never used gpu. They don't know what kuda is. They, they, they just want to run something.Nader: Yeah.Kyle: You need new UX that is not just. Hey, you know, how do you program something in Cuda and run it?And then, and then we built, you know, like when Deep Learning was getting big, we built, we built Torch and, and, but so recently the amount of like [00:11:00] layers that are added to that developer stack has just exploded because AI has become ubiquitous. Everyone's using it in different ways. Yeah. It'sNader: moving fast in every direction.Vertical, horizontal.Vibhu: Yeah. You guys, you even take it down to hardware, like the DGX Spark, you know, it's, it's basically the same system as just throwing it up on big GPU cluster.Nader: Yeah, yeah, yeah. It's amazing. Blackwell.swyx: Yeah. Uh, we saw the preview at the last year's GTC and that was one of the better performing, uh, videos so far, and video coverage so far.Awesome. This will beat it. Um,Nader: that wasswyx: actually, we have fingersNader: crossed. Yeah.DGX Spark and Remote AccessNader: Even when Grace Blackwell or when, um, uh, DGX Spark was first coming out getting to be involved in that from the beginning of the developer experience. And it just comes back to what youswyx: were involved.Nader: Yeah. St. St.swyx: Mars.Nader: Yeah. Yeah. I mean from, it was just like, I, I got an email, we just got thrown into the loop and suddenly yeah, I, it was actually really funny ‘cause I'm still pretty fresh from the acquisition and I'm, I'm getting an email from a bunch of the engineering VPs about like, the new hardware, GPU chip, like we're, or not chip, but just GPU system that we're putting out.And I'm like, okay, cool. Matters. Now involved with this for the ux, I'm like. What am I gonna do [00:12:00] here? So, I remember the first meeting, I was just like kind of quiet as I was hearing engineering VPs talk about what this box could be, what it could do, how we should use it. And I remember, uh, one of the first ideas that people were idea was like, oh, the first thing that it was like, I think a quote was like, the first thing someone's gonna wanna do with this is get two of them and run a Kubernetes cluster on top of them.And I was like, oh, I think I know why I'm here. I was like, the first thing we're doing is easy. SSH into the machine. And then, and you know, just kind of like scoping it down of like, once you can do that every, you, like the person who wants to run a Kubernetes cluster onto Sparks has a higher propensity for pain, then, then you know someone who buys it and wants to run open Claw right now, right?If you can make sure that that's as effortless as possible, then the rest becomes easy. So there's a tool called Nvidia Sync. It just makes the SSH connection really simple. So, you know, if you think about it like. If you have a Mac, uh, or a PC or whatever, if you have a laptop and you buy this GPU and you want to use it, you should be able to use it like it's A-A-G-P-U in the cloud, right?Um, but there's all this friction of like, how do you actually get into that? That's part of [00:13:00] Revs value proposition is just, you know, there's a CLI that wraps SSH and makes it simple. And so our goal is just get you into that machine really easily. And one thing we just launched at CES, it's in, it's still in like early access.We're ironing out some kinks, but it should be ready by GTC. You can register your spark on Brev. And so now if youswyx: like remote managed yeah, local hardware. Single pane of glass. Yeah. Yeah. Because Brev can already manage other clouds anyway, right?Vibhu: Yeah, yeah. And you use the spark on Brev as well, right?Nader: Yeah. But yeah, exactly. So, so you, you, so you, you set it up at home you can run the command on it, and then it gets it's essentially it'll appear in your Brev account, and then you can take your laptop to a Starbucks or to a cafe, and you'll continue to use your, you can continue use your spark just like any other cloud node on Brev.Yeah. Yeah. And it's just like a pre-provisioned centerswyx: in yourNader: home. Yeah, exactly.swyx: Yeah. Yeah.Vibhu: Tiny little data center.Nader: Tiny little, the size ofVibhu: your phone.SOL Culture and Dynamo Setupswyx: One more thing before we move on to Kyle. Just have so many Jensen stories and I just love, love mining Jensen stories. Uh, my favorite so far is SOL. Uh, what is, yeah, what is S-O-L-S-O-LNader: is actually, i, I think [00:14:00] of all the lessons I've learned, that one's definitely my favorite.Kyle: It'll always stick with you.Nader: Yeah. Yeah. I, you know, in your startup, everything's existential, right? Like we've, we've run out of money. We were like, on the risk of, of losing payroll, we've had to contract our team because we l ran outta money. And so like, um, because of that you're really always forcing yourself to I to like understand the root cause of everything.If you get a date, if you get a timeline, you know exactly why that date or timeline is there. You're, you're pushing every boundary and like, you're not just say, you're not just accepting like a, a no. Just because. And so as you start to introduce more layers, as you start to become a much larger organization, SOL is is essentially like what is the physics, right?The speed of light moves at a certain speed. So if flight's moving some slower, then you know something's in the way. So before trying to like layer reality back in of like, why can't this be delivered at some date? Let's just understand the physics. What is the theoretical limit to like, uh, how fast this can go?And then start to tell me why. ‘cause otherwise people will start telling you why something can't be done. But actually I think any great leader's goal is just to create urgency. Yeah. [00:15:00] There's an infiniteKyle: create compelling events, right?Nader: Yeah.Kyle: Yeah. So l is a term video is used to instigate a compelling event.You say this is done. How do we get there? What is the minimum? As much as necessary, as little as possible thing that it takes for us to get exactly here and. It helps you just break through a bunch of noise.swyx: Yeah.Kyle: Instantly.swyx: One thing I'm unclear about is, can only Jensen use the SOL card? Like, oh, no, no, no.Not everyone get the b******t out because obviously it's Jensen, but like, can someone else be like, no, likeKyle: frontline engineers use it.Nader: Yeah. Every, I think it's not so much about like, get the b******t out. It's like, it's like, give me the root understanding, right? Like, if you tell me something takes three weeks, it like, well, what's the first principles?Yeah, the first principles. It's like, what's the, what? Like why is it three weeks? What is the actual yeah. What's the actual limit of why this is gonna take three weeks? If you're gonna, if you, if let's say you wanted to buy a new computer and someone told you it's gonna be here in five days, what's the SOL?Well, like the SOL is like, I could walk into a Best Buy and pick it up for you. Right? So then anything that's like beyond that is, and is that practical? Is that how we're gonna, you know, let's say give everyone in the [00:16:00] company a laptop, like obviously not. So then like that's the SOL and then it's like, okay, well if we have to get more than 10, suddenly there might be some, right?And so now we can kind of piece the reality back.swyx: So, so this is the. Paul Graham do things that don't scale. Yeah. And this is also the, what people would now call behi agency. Yeah.Kyle: It's actually really interesting because there's a, there's a second hardware angle to SOL that like doesn't come up for all the org sol is used like culturally at aswyx: media for everything.I'm also mining for like, I think that can be annoying sometimes. And like someone keeps going IOO you and you're like, guys, like we have to be stable. We have to, we to f*****g plan. Yeah.Kyle: It's an interesting balance.Nader: Yeah. I encounter that with like, actually just with, with Alec, right? ‘cause we, we have a new conference so we need to launch, we have, we have goals of what we wanna launch by, uh, by the conference and like, yeah.At the end of the day, where isswyx: this GTC?Nader: Um, well this is like, so we, I mean we did it for CES, we did for GT CDC before that we're doing it for GTC San Jose. So I mean, like every, you know, we have a new moment. Um, and we want to launch something. Yeah. And we want to do so at SOL and that does mean that some, there's some level of prioritization that needs [00:17:00] to happen.And so it, it is difficult, right? I think, um, you have to be careful with what you're pushing. You know, stability is important and that should be factored into S-O-L-S-O-L isn't just like, build everything and let it break, you know, that, that's part of the conversation. So as you're laying, layering in all the details, one of them might be, Hey, we could build this, but then it's not gonna be stable for X, y, z reasons.And so that was like, one of our conversations for CES was, you know, hey, like we, we can get this into early access registering your spark with brev. But there are a lot of things that we need to do in order to feel really comfortable from a security perspective, right? There's a lot of networking involved before we deliver that to users.So it's like, okay. Let's get this to a point where we can at least let people experiment with it. We had it in a booth, we had it in Jensen's keynote, and then let's go iron out all the networking kinks. And that's not easy. And so, uh, that can come later. And so that was the way that we layered that back in.Yeah. ButKyle: It's not really about saying like, you don't have to do the, the maintenance or operational work. It's more about saying, you know, it's kind of like [00:18:00] highlights how progress is incremental, right? Like, what is the minimum thing that we can get to. And then there's SOL for like every component after that.But there's the SOL to get you, get you to the, the starting line. And that, that's usually how it's asked. Yeah. On the other side, you know, like SOL came out of like hardware at Nvidia. Right. So SOL is like literally if we ran the accelerator or the GPU with like at basically full speed with like no other constraints, like how FAST would be able to make a program go.swyx: Yeah. Yeah. Right.Kyle: Soswyx: in, in training that like, you know, then you work back to like some percentage of like MFU for example.Kyle: Yeah, that's a, that's a great example. So like, there's an, there's an S-O-L-M-F-U, and then there's like, you know, what's practically achievable.swyx: Cool. Should we move on to sort of, uh, Kyle's side?Uh, Kyle, you're coming more from the data science world. And, uh, I, I mean I always, whenever, whenever I meet someone who's done working in tabular stuff, graph neural networks, time series, these are basically when I go to new reps, I go to ICML, I walk the back halls. There's always like a small group of graph people.Yes. Absolute small group of tabular people. [00:19:00] And like, there's no one there. And like, it's very like, you know what I mean? Like, yeah, no, like it's, it's important interesting work if you care about solving the problems that they solve.Kyle: Yeah.swyx: But everyone else is just LMS all the time.Kyle: Yeah. I mean it's like, it's like the black hole, right?Has the event horizon reached this yet in nerves? Um,swyx: but like, you know, those are, those are transformers too. Yeah. And, and those are also like interesting things. Anyway, uh, I just wanted to spend a little bit of time on, on those, that background before we go into Dynamo, uh, proper.Kyle: Yeah, sure. I took a different path to Nvidia than that, or I joined six years ago, seven, if you count, when I was an intern.So I joined Nvidia, like right outta college. And the first thing I jumped into was not what I'd done in, during internship, which was like, you know, like some stuff for autonomous vehicles, like heavyweight object detection. I jumped into like, you know, something, I'm like, recommenders, this is popular. Andswyx: yeah, he did RexiKyle: as well.Yeah, Rexi. Yeah. I mean that, that was the taboo data at the time, right? You have tables of like, audience qualities and item qualities, and you're trying to figure out like which member of [00:20:00] the audience matches which item or, or more practically which item matches which member of the audience. And at the time, really it was like we were trying to enable.Uh, recommender, which had historically been like a little bit of a CP based workflow into something that like, ran really well in GPUs. And it's since been done. Like there are a bunch of libraries for Axis that run on GPUs. Uh, the common models like Deeplearning recommendation model, which came outta meta and the wide and deep model, which was used or was released by Google were very accelerated by GPUs using, you know, the fast HBM on the chips, especially to do, you know, vector lookups.But it was very interesting at the time and super, super relevant because like we were starting to get like. This explosion of feeds and things that required rec recommenders to just actively be on all the time. And sort of transitioned that a little bit towards graph neural networks when I discovered them because I was like, okay, you can actually use graphical neural networks to represent like, relationships between people, items, concepts, and that, that interested me.So I jumped into that at [00:21:00] Nvidia and, and got really involved for like two-ish years.swyx: Yeah. Uh, and something I learned from Brian Zaro Yeah. Is that you can just kind of choose your own path in Nvidia.Kyle: Oh my God. Yeah.swyx: Which is not a normal big Corp thing. Yeah. Like you, you have a lane, you stay in your lane.Nader: I think probably the reason why I enjoy being in a, a big company, the mission is the boss probably from a startup guy. Yeah. The missionswyx: is the boss.Nader: Yeah. Uh, it feels like a big game of pickup basketball. Like, you know, if you play one, if you wanna play basketball, you just go up to the court and you're like, Hey look, we're gonna play this game and we need three.Yeah. And you just like find your three. That's honestly for every new initiative that's what it feels like. Yeah.Vibhu: It also like shows, right? Like Nvidia. Just releasing state-of-the-art stuff in every domain. Yeah. Like, okay, you expect foundation models with Nemo tron voice just randomly parakeet.Call parakeet just comes out another one, uh, voice. TheKyle: video voice team has always been producing.Vibhu: Yeah. There's always just every other domain of paper that comes out, dataset that comes out. It's like, I mean, it also stems back to what Nvidia has to do, right? You have to make chips years before they're actually produced.Right? So you need to know, you need to really [00:22:00] focus. TheKyle: design process starts likeVibhu: exactlyKyle: three to five years before the chip gets to the market.Vibhu: Yeah. I, I'm curious more about what that's like, right? So like, you have specialist teams. Is it just like, you know, people find an interest, you go in, you go deep on whatever, and that kind of feeds back into, you know, okay, we, we expect predictions.Like the internals at Nvidia must be crazy. Right? You know? Yeah. Yeah. You know, you, you must. Not even without selling to people, you have your own predictions of where things are going. Yeah. And they're very based, very grounded. Right?Kyle: Yeah. It, it, it's really interesting. So there's like two things that I think that Amed does, which are quite interesting.Uh, one is like, we really index into passion. There's a big. Sort of organizational top sound push to like ensure that people are working on the things that they're passionate about. So if someone proposes something that's interesting, many times they can just email someone like way up the chain that they would find this relevant and say like, Hey, can I go work on this?Nader: It's actually like I worked at a, a big company for a couple years before, uh, starting on my startup journey and like, it felt very weird if you were to like email out of chain, if that makes [00:23:00] sense. Yeah. The emails at Nvidia are like mosh pitsswyx: shoot,Nader: and it's just like 60 people, just whatever. And like they're, there's this,swyx: they got messy like, reply all you,Nader: oh, it's in, it's insane.It's insane. They justKyle: help. You know, Maxim,Nader: the context. But, but that's actually like, I've actually, so this is a weird thing where I used to be like, why would we send emails? We have Slack. I am the entire, I'm the exact opposite. I feel so bad for anyone who's like messaging me on Slack ‘cause I'm so unresponsive.swyx: Your emailNader: Maxi, email Maxim. I'm email maxing Now email is a different, email is perfect because man, we can't work together. I'm email is great, right? Because important threads get bumped back up, right? Yeah, yeah. Um, and so Slack doesn't do that. So I just have like this casino going off on the right or on the left and like, I don't know which thread was from where or what, but like the threads get And then also just like the subject, so you can have like working threads.I think what's difficult is like when you're small, if you're just not 40,000 people I think Slack will work fine, but there's, I don't know what the inflection point is. There is gonna be a point where that becomes really messy and you'll actually prefer having email. ‘cause you can have working threads.You can cc more than nine people in a thread.Kyle: You can fork stuff.Nader: You can [00:24:00] fork stuff, which is super nice and just like y Yeah. And so, but that is part of where you can propose a plan. You can also just. Start, honestly, momentum's the only authority, right? So like, if you can just start, start to make a little bit of progress and show someone something, and then they can try it.That's, I think what's been, you know, I think the most effective way to push anything for forward. And that's both at Nvidia and I think just generally.Kyle: Yeah, there's, there's the other concept that like is explored a lot at Nvidia, which is this idea of a zero billion dollar business. Like market creation is a big thing at Nvidia.Like,swyx: oh, you want to go and start a zero billion dollar business?Kyle: Jensen says, we are completely happy investing in zero billion dollar markets. We don't care if this creates revenue. It's important for us to know about this market. We think it will be important in the future. It can be zero billion dollars for a while.I'm probably minging as words here for, but like, you know, like, I'll give an example. NVIDIA's been working on autonomous driving for a a long time,swyx: like an Nvidia car.Kyle: No, they, they'veVibhu: used the Mercedes, right? They're around the HQ and I think it finally just got licensed out. Now they're starting to be used quite a [00:25:00] bit.For 10 years you've been seeing Mercedes with Nvidia logos driving.Kyle: If you're in like the South San Santa Clara, it's, it's actually from South. Yeah. So, um. Zero billion dollar markets are, are a thing like, you know, Jensen,swyx: I mean, okay, look, cars are not a zero billion dollar market. But yeah, that's a bad example.Nader: I think, I think he's, he's messaging, uh, zero today, but, or even like internally, right? Like, like it's like, uh, an org doesn't have to ruthlessly find revenue very quickly to justify their existence. Right. Like a lot of the important research, a lot of the important technology being developed that, that's kind ofKyle: where research, research is very ide ideologically free at Nvidia.Yeah. Like they can pursue things that they wereswyx: Were you research officially?Kyle: I was never in research. Officially. I was always in engineering. Yeah. We in, I'm in an org called Deep Warning Algorithms, which is basically just how do we make things that are relevant to deep warning go fast.swyx: That sounds freaking cool.Vibhu: And I think a lot of that is underappreciated, right? Like time series. This week Google put out time. FF paper. Yeah. A new time series, paper res. Uh, Symantec, ID [00:26:00] started applying Transformers LMS to Yes. Rec system. Yes. And when you think the scale of companies deploying these right. Amazon recommendations, Google web search, it's like, it's huge scale andKyle: Yeah.Vibhu: You want fast?Kyle: Yeah. Yeah. Yeah. Actually it's, it, I, there's a fun moment that brought me like full circle. Like, uh, Amazon Ads recently gave a talk where they talked about using Dynamo for generative recommendation, which was like super, like weirdly cathartic for me. I'm like, oh my God. I've, I've supplanted what I was working on.Like, I, you're using LMS now to do what I was doing five years ago.swyx: Yeah. Amazing. And let's go right into Dynamo. Uh, maybe introduce Yeah, sure. To the top down and Yeah.Kyle: I think at this point a lot of people are familiar with the term of inference. Like funnily enough, like I went from, you know, inference being like a really niche topic to being something that's like discussed on like normal people's Twitter feeds.It's,Nader: it's on billboardsKyle: here now. Yeah. Very, very strange. Driving, driving, seeing just an inference ad on 1 0 1 inference at scale is becoming a lot more important. Uh, we have these moments like, you know, open claw where you have these [00:27:00] agents that take lots and lots of tokens, but produce, incredible results.There are many different aspects of test time scaling so that, you know, you can use more inference to generate a better result than if you were to use like a short amount of inference. There's reasoning, there's quiring, there's, adding agency to the model, allowing it to call tools and use skills.Dyno sort came about at Nvidia. Because myself and a couple others were, were sort of talking about the, these concepts that like, you know, you have inference engines like VLMS, shelan, tenor, TLM and they have like one single copy. They, they, they sort of think about like things as like one single copy, like one replica, right?Why Scale Out WinsKyle: Like one version of the model. But when you're actually serving things at scale, you can't just scale up that replica because you end up with like performance problems. There's a scaling limit to scaling up replicas. So you actually have to scale out to use a, maybe some Kubernetes type terminology.We kind of realized that there was like. A lot of potential optimization that we could do in scaling out and building systems for data [00:28:00] center scale inference. So Dynamo is this data center scale inference engine that sits on top of the frameworks like VLM Shilling and 10 T lm and just makes things go faster because you can leverage the economy of scale.The fact that you have KV cash, which we can define a little bit later, uh, in all these machines that is like unique and you wanna figure out like the ways to maximize your cash hits or you want to employ new techniques in inference like disaggregation, which Dynamo had introduced to the world in, in, in March, not introduced, it was a academic talk, but beforehand.But we are, you know, one of the first frameworks to start, supporting it. And we wanna like, sort of combine all these techniques into sort of a modular framework that allows you to. Accelerate your inference at scale.Nader: By the way, Kyle and I became friends on my first date, Nvidia, and I always loved, ‘cause like he always teaches meswyx: new things.Yeah. By the way, this is why I wanted to put two of you together. I was like, yeah, this is, this is gonna beKyle: good. It's very, it's very different, you know, like we've, we, we've, we've talked to each other a bunch [00:29:00] actually, you asked like, why, why can't we scale up?Nader: Yeah.Scale Up Limits ExplainedNader: model, you said model replicas.Kyle: Yeah. So you, so scale up means assigning moreswyx: heavier?Kyle: Yeah, heavier. Like making things heavier. Yeah, adding more GPUs. Adding more CPUs. Scale out is just like having a barrier saying, I'm gonna duplicate my representation of the model or a representation of this microservice or something, and I'm gonna like, replicate it Many times.Handle, load. And the reason that you can't scale, scale up, uh, past some points is like, you know, there, there, there are sort of hardware bounds and algorithmic bounds on, on that type of scaling. So I'll give you a good example that's like very trivial. Let's say you're on an H 100. The Maxim ENV link domain for H 100, for most Ds H one hundreds is heus, right?So if you scaled up past that, you're gonna have to figure out ways to handle the fact that now for the GPUs to communicate, you have to do it over Infin band, which is still very fast, but is not as fast as ENV link.swyx: Is it like one order of magnitude, like hundreds or,Kyle: it's about an order of magnitude?Yeah. Okay. Um, soswyx: not terrible.Kyle: [00:30:00] Yeah. I, I need to, I need to remember the, the data sheet here, like, I think it's like about 500 gigabytes. Uh, a second unidirectional for ENV link, and about 50 gigabytes a second unidirectional for Infin Band. I, it, it depends on the, the generation.swyx: I just wanna set this up for people who are not familiar with these kinds of like layers and the trash speedVibhu: and all that.Of course.From Laptop to Multi NodeVibhu: Also, maybe even just going like a few steps back before that, like most people are very familiar with. You see a, you know, you can use on your laptop, whatever these steel viol, lm you can just run inference there. All, there's all, you can, youcan run it on thatVibhu: laptop. You can run on laptop.Then you get to, okay, uh, models got pretty big, right? JLM five, they doubled the size, so mm-hmm. Uh, what do you do when you have to go from, okay, I can get 128 gigs of memory. I can run it on a spark. Then you have to go multi GPU. Yeah. Okay. Multi GPU, there's some support there. Now, if I'm a company and I don't have like.I'm not hiring the best researchers for this. Right. But I need to go [00:31:00] multi-node, right? I have a lot of servers. Okay, now there's efficiency problems, right? You can have multiple eight H 100 nodes, but, you know, is that as a, like, how do you do that efficiently?Kyle: Yeah. How do you like represent them? How do you choose how to represent the model?Yeah, exactly right. That's a, that's like a hard question. Everyone asks, how do you size oh, I wanna run GLM five, which just came out new model. There have been like four of them in the past week, by the way, like a bunch of new models.swyx: You know why? Right? Deep seek.Kyle: No comment. Oh. Yeah, but Ggl, LM five, right?We, we have this, new model. It's, it's like a large size, and you have to figure out how to both scale up and scale out, right? Because you have to find the right representation that you care about. Everyone does this differently. Let's be very clear. Everyone figures this out in their own path.Nader: I feel like a lot of AI or ML even is like, is like this. I think people think, you know, I, I was, there was some tweet a few months ago that was like, why hasn't fine tuning as a service taken off? You know, that might be me. It might have been you. Yeah. But people want it to be such an easy recipe to follow.But even like if you look at an ML model and specificKyle: to you Yeah,Nader: yeah.Kyle: And the [00:32:00] model,Nader: the situation, and there's just so much tinkering, right? Like when you see a model that has however many experts in the ME model, it's like, why that many experts? I don't, they, you know, they tried a bunch of things and that one seemed to do better.I think when it comes to how you're serving inference, you know, you have a bunch of decisions to make and there you can always argue that you can take something and make it more optimal. But I think it's this internal calibration and appetite for continued calibration.Vibhu: Yeah. And that doesn't mean like, you know, people aren't taking a shot at this, like tinker from thinking machines, you know?Yeah. RL as a service. Yeah, totally. It's, it also gets even harder when you try to do big model training, right? We're not the best at training Moes, uh, when they're pre-trained. Like we saw this with LAMA three, right? They're trained in such a sparse way that meta knows there's gonna be a bunch of inference done on these, right?They'll open source it, but it's very trained for what meta infrastructure wants, right? They wanna, they wanna inference it a lot. Now the question to basically think about is, okay, say you wanna serve a chat application, a coding copilot, right? You're doing a layer of rl, you're serving a model for X amount of people.Is it a chat model, a coding model? Dynamo, you know, back to that,Kyle: it's [00:33:00] like, yeah, sorry. So you we, we sort of like jumped off of, you know, jumped, uh, on that topic. Everyone has like, their own, own journey.Cost Quality Latency TradeoffsKyle: And I, I like to think of it as defined by like, what is the model you need? What is the accuracy you need?Actually I talked to NA about this earlier. There's three axes you care about. What is the quality that you're able to produce? So like, are you accurate enough or can you complete the task with enough, performance, high enough performance. Yeah, yeah. Uh, there's cost. Can you serve the model or serve your workflow?Because it's not just the model anymore, it's the workflow. It's the multi turn with an agent cheaply enough. And then can you serve it fast enough? And we're seeing all three of these, like, play out, like we saw, we saw new models from OpenAI that you know, are faster. You have like these new fast versions of models.You can change the amount of thinking to change the amount of quality, right? Produce more tokens, but at a higher cost in a, in a higher latency. And really like when you start this journey of like trying to figure out how you wanna host a model, you, you, you think about three things. What is the model I need to serve?How many times do I need to call it? What is the input sequence link was [00:34:00] the, what does the workflow look like on top of it? What is the SLA, what is the latency SLA that I need to achieve? Because there's usually some, this is usually like a constant, you, you know, the SLA that you need to hit and then like you try and find the lowest cost version that hits all of these constraints.Usually, you know, you, you start with those things and you say you, you kind of do like a bit of experimentation across some common configurations. You change the tensor parallel size, which is a form of parallelismVibhu: I take, it goes even deeper first. Gotta think what model.Kyle: Yes, course,ofKyle: course. It's like, it's like a multi-step design process because as you said, you can, you can choose a smaller model and then do more test time scaling and it'll equate the quality of a larger model because you're doing the test time scaling or you're adding a harness or something.So yes, it, it goes way deeper than that. But from the performance perspective, like once you get to the model you need, you need to host, you look at that and you say, Hey. I have this model, I need to serve it at the speed. What is the right configuration for that?Nader: You guys see the recent, uh, there was a paper I just saw like a few days ago that, uh, if you run [00:35:00] the same prompt twice, you're getting like double Just try itagain.Nader: Yeah, exactly.Vibhu: And you get a lot. Yeah. But the, the key thing there is you give the context of the failed try, right? Yeah. So it takes a shot. And this has been like, you know, basic guidance for quite a while. Just try again. ‘cause you know, trying, just try again. Did you try again? All adviceNader: in life.Vibhu: Just, it's a paper from Google, if I'm not mistaken, right?Yeah,Vibhu: yeah. I think it, it's like a seven bas little short paper. Yeah. Yeah. The title's very cute. And it's just like, yeah, just try again. Give it ask context,Kyle: multi-shot. You just like, say like, hey, like, you know, like take, take a little bit more, take a little bit more information, try and fail. Fail.Vibhu: And that basic concept has gone pretty deep.There's like, um, self distillation, rl where you, you do self distillation, you do rl and you have past failure and you know, that gives some signal so people take, try it again. Not strong enough.swyx: Uh, for, for listeners, uh, who listen to here, uh, vivo actually, and I, and we run a second YouTube channel for our paper club where, oh, that's awesome.Vivo just covered this. Yeah. Awesome. Self desolation and all that's, that's why he, to speed [00:36:00] on it.Nader: I'll to check it out.swyx: Yeah. It, it's just a good practice, like everyone needs, like a paper club where like you just read papers together and the social pressure just kind of forces you to just,Nader: we, we,there'sNader: like a big inference.Kyle: ReadingNader: group at a video. I feel so bad every time. I I, he put it on like, on our, he shared it.swyx: One, one ofNader: your guys,swyx: uh, is, is big in that, I forget es han Yeah, yeah,Kyle: es Han's on my team. Actually. Funny. There's a, there's a, there's a employee transfer between us. Han worked for Nater at Brev, and now he, he's on my team.He wasNader: our head of ai. And then, yeah, once we got in, andswyx: because I'm always looking for like, okay, can, can I start at another podcast that only does that thing? Yeah. And, uh, Esan was like, I was trying to like nudge Esan into like, is there something here? I mean, I don't think there's, there's new infant techniques every day.So it's like, it's likeKyle: you would, you would actually be surprised, um, the amount of blog posts you see. And ifswyx: there's a period where it was like, Medusa hydra, what Eagle, like, youKyle: know, now we have new forms of decode, uh, we have new forms of specula, of decoding or new,swyx: what,Kyle: what are youVibhu: excited? And it's exciting when you guys put out something like Tron.‘cause I remember the paper on this Tron three, [00:37:00] uh, the amount of like post train, the on tokens that the GPU rich can just train on. And it, it was a hybrid state space model, right? Yeah.Kyle: It's co-designed for the hardware.Vibhu: Yeah, go design for the hardware. And one of the things was always, you know, the state space models don't scale as well when you do a conversion or whatever the performance.And you guys are like, no, just keep draining. And Nitron shows a lot of that. Yeah.Nader: Also, something cool about Nitron it was released in layers, if you will, very similar to Dynamo. It's, it's, it's essentially it was released as you can, the pre-training, post-training data sets are released. Yeah. The recipes on how to do it are released.The model itself is released. It's full model. You just benefit from us turning on the GPUs. But there are companies like, uh, ServiceNow took the dataset and they trained their own model and we were super excited and like, you know, celebrated that work.ZoomVibhu: different. Zoom is, zoom is CGI, I think, uh, you know, also just to add like a lot of models don't put out based models and if there's that, why is fine tuning not taken off?You know, you can do your own training. Yeah,Kyle: sure.Vibhu: You guys put out based model, I think you put out everything.Nader: I believe I know [00:38:00]swyx: about base. BasicallyVibhu: without baseswyx: basic can be cancelable.Vibhu: Yeah. Base can be cancelable.swyx: Yeah.Vibhu: Safety training.swyx: Did we get a full picture of dymo? I, I don't know if we, what,Nader: what I'd love is you, you mentioned the three axes like break it down of like, you know, what's prefilled decode and like what are the optimizations that we can get with Dynamo?Kyle: Yeah. That, that's, that's, that's a great point. So to summarize on that three axis problem, right, there are three things that determine whether or not something can be done with inference, cost, quality, latency, right? Dynamo is supposed to be there to provide you like the runtime that allows you to pull levers to, you know, mix it up and move around the parade of frontier or the preto surface that determines is this actually possible with inference And AI todayNader: gives you the knobs.Kyle: Yeah, exactly. It gives you the knobs.Disaggregation Prefill vs DecodeKyle: Uh, and one thing that like we, we use a lot in contemporary inference and is, you know, starting to like pick up from, you know, in, in general knowledge is this co concept of disaggregation. So historically. Models would be hosted with a single inference engine. And that inference engine [00:39:00] would ping pong between two phases.There's prefill where you're reading the sequence generating KV cache, which is basically just a set of vectors that represent the sequence. And then using that KV cache to generate new tokens, which is called Decode. And some brilliant researchers across multiple different papers essentially made the realization that if you separate these two phases, you actually gain some benefits.Those benefits are basically a you don't have to worry about step synchronous scheduling. So the way that an inference engine works is you do one step and then you finish it, and then you schedule, you start scheduling the next step there. It's not like fully asynchronous. And the problem with that is you would have, uh, essentially pre-fill and decode are, are actually very different in terms of both their resource requirements and their sometimes their runtime.So you would have like prefill that would like block decode steps because you, you'd still be pre-filing and you couldn't schedule because you know the step has to end. So you remove that scheduling issue and then you also allow you, or you yourself, to like [00:40:00] split the work into two different ki types of pools.So pre-fill typically, and, and this changes as, as model architecture changes. Pre-fill is, right now, compute bound most of the time with the sequence is sufficiently long. It's compute bound. On the decode side because you're doing a full Passover, all the weights and the entire sequence, every time you do a decode step and you're, you don't have the quadratic computation of KV cache, it's usually memory bound because you're retrieving a linear amount of memory and you're doing a linear amount of compute as opposed to prefill where you retrieve a linear amount of memory and then use a quadratic.You know,Nader: it's funny, someone exo Labs did a really cool demo where for the DGX Spark, which has a lot more compute, you can do the pre the compute hungry prefill on a DG X spark and then do the decode on a, on a Mac. Yeah. And soVibhu: that's faster.Nader: Yeah. Yeah.Kyle: So you could, you can do that. You can do machine strat stratification.Nader: Yeah.Kyle: And like with our future generation generations of hardware, we actually announced, like with Reuben, this [00:41:00] new accelerator that is prefilled specific. It's called Reuben, CPX. SoKubernetes Scaling with GroveNader: I have a question when you do the scale out. Yeah. Is scaling out easier with Dynamo? Because when you need a new node, you can dedicate it to either the Prefill or, uh, decode.Kyle: Yeah. So Dynamo actually has like a, a Kubernetes component in it called Grove that allows you to, to do this like crazy scaling specialization. It has like this hot, it's a representation that, I don't wanna go too deep into Kubernetes here, but there was a previous way that you would like launch multi-node work.Uh, it's called Leader Worker Set. It's in the Kubernetes standard, and Leader worker set is great. It served a lot of people super well for a long period of time. But one of the things that it's struggles with is representing a set of cases where you have a multi-node replica that has a pair, right?You know, prefill and decode, or it's not paired, but it has like a second stage that has a ratio that changes over time. And prefill and decode are like two different things as your workload changes, right? The amount of prefill you'll need to do may change. [00:42:00] The amount of decode that you, you'll need to do might change, right?Like, let's say you start getting like insanely long queries, right? That probably means that your prefill scales like harder because you're hitting these, this quadratic scaling growth.swyx: Yeah.And then for listeners, like prefill will be long input. Decode would be long output, for example, right?Kyle: Yeah. So like decode, decode scale. I mean, decode is funny because the amount of tokens that you produce scales with the output length, but the amount of work that you do per step scales with the amount of tokens in the context.swyx: Yes.Kyle: So both scales with the input and the output.swyx: That's true.Kyle: But on the pre-fold view code side, like if.Suddenly, like the amount of work you're doing on the decode side stays about the same or like scales a little bit, and then the prefilled side like jumps up a lot. You actually don't want that ratio to be the same. You want it to change over time. So Dynamo has a set of components that A, tell you how to scale.It tells you how many prefilled workers and decoded workers you, it thinks you should have, and also provides a scheduling API for Kubernetes that allows you to actually represent and affect this scheduling on, on, on your actual [00:43:00] hardware, on your compute infrastructure.Nader: Not gonna lie. I feel a little embarrassed for being proud of my SVG function earlier.swyx: No, itNader: wasreallyKyle: cute. I, Iswyx: likeNader: it's all,swyx: it's all engineering. It's all engineering. Um, that's where I'mKyle: technical.swyx: One thing I'm, I'm kind of just curious about with all with you see at a systems level, everything going on here. Mm-hmm. And we, you know, we're scaling it up in, in multi, in distributed systems.Context Length and Co Designswyx: Um, I think one thing that's like kind of, of the moment right now is people are asking, is there any SOL sort of upper bounds. In terms of like, let's call, just call it context length for one for of a better word, but you can break it down however you like.Nader: Yeah.swyx: I just think like, well, yeah, I mean, like clearly you can engage in hybrid architectures and throw in some state space models in there.All, all you want, but it looks, still looks very attention heavy.Kyle: Yes. Uh, yeah. Long context is attention heavy. I mean, we have these hybrid models, um,swyx: to take and most, most models like cap out at a million contexts and that's it. Yeah. Like for the last two years has been it.Kyle: Yeah. The model hardware context co-design thing that we're seeing these days is actually super [00:44:00] interesting.It's like my, my passion, like my secret side passion. We see models like Kimmy or G-P-T-O-S-S. I'm use these because I, I know specific things about these models. So Kimmy two comes out, right? And it's an interesting model. It's like, like a deep seek style architecture is MLA. It's basically deep seek, scaled like a little bit differently, um, and obviously trained differently as well.But they, they talked about, why they made the design choices for context. Kimmy has more experts, but fewer attention heads, and I believe a slightly smaller attention, uh, like dimension. But I need to remember, I need to check that. Uh, it doesn't matter. But they discussed this actually at length in a blog post on ji, which is like our pu which is like credit puswyx: Yeah.Kyle: Um, in, in China. Chinese red.swyx: Yeah.Kyle: It's, yeah. So it, it's, it's actually an incredible blog post. Uh, like all the mls people in, in, in that, I've seen that on GPU are like very brilliant, but they, they talk about like the creators of Kimi K two [00:45:00] actually like, talked about it on, on, on there in the blog post.And they say, we, we actually did an experiment, right? Attention scales with the number of heads, obviously. Like if you have 64 heads versus 32 heads, you do half the work of attention. You still scale quadratic, but you do half the work. And they made a, a very specific like. Sort of barter in their system, in their architecture, they basically said, Hey, what if we gave it more experts, so we're gonna use more memory capacity.But we keep the amount of activated experts the same. We increase the expert sparsity, so we have fewer experts act. The ratio to of experts activated to number of experts is smaller, and we decrease the number of attention heads.Vibhu: And kind of for context, what the, what we had been seeing was you make models sparser instead.So no one was really touching heads. You're just having, uh,Kyle: well, they, they did, they implicitly made it sparser.Vibhu: Yeah, yeah. For, for Kimmy. They did,Kyle: yes.Vibhu: They also made it sparser. But basically what we were seeing was people were at the level of, okay, there's a sparsity ratio. You want more total parameters, less active, and that's sparsity.[00:46:00]But what you see from papers, like, the labs like moonshot deep seek, they go to the level of, okay, outside of just number of experts, you can also change how many attention heads and less attention layers. More attention. Layers. Layers, yeah. Yes, yes. So, and that's all basically coming back to, just tied together is like hardware model, co-design, which isKyle: hardware model, co model, context, co-design.Vibhu: Yeah.Kyle: Right. Like if you were training a, a model that was like. Really, really short context, uh, or like really is good at super short context tasks. You may like design it in a way such that like you don't care about attention scaling because it hasn't hit that, like the turning point where like the quadratic curve takes over.Nader: How do you consider attention or context as a separate part of the co-design? Like I would imagine hardware or just how I would've thought of it is like hardware model. Co-design would be hardware model context co-designKyle: because the harness and the context that is produced by the harness is a part of the model.Once it's trained in,Vibhu: like even though towards the end you'll do long context, you're not changing architecture through I see. Training. Yeah.Kyle: I mean you can try.swyx: You're saying [00:47:00] everyone's training the harness into the model.Kyle: I would say to some degree, orswyx: there's co-design for harness. I know there's a small amount, but I feel like not everyone has like gone full send on this.Kyle: I think, I think I think it's important to internalize the harness that you think the model will be running. Running into the model.swyx: Yeah. Interesting. Okay. Bash is like the universal harness,Kyle: right? Like I'll, I'll give. An example here, right? I mean, or just like a, like a, it's easy proof, right? If you can train against a harness and you're using that harness for everything, wouldn't you just train with the harness to ensure that you get the best possible quality out of,swyx: Well, the, uh, I, I can provide a counter argument.Yeah, sure. Which is what you wanna provide a generally useful model for other people to plug into their harnesses, right? So if youKyle: Yeah. Harnesses can be open, open source, right?swyx: Yeah. So I mean, that's, that's effectively what's happening with Codex.Kyle: Yeah.swyx: And, but like you may want like a different search tool and then you may have to name it differently or,Nader: I don't know how much people have pushed on this, but can you.Train a model, would it be, have you have people compared training a model for the for the harness versus [00:48:00] like post training forswyx: I think it's the same thing. It's the same thing. It's okay. Just extra post training. INader: see.swyx: And so, I mean, cognition does this course, it does this where you, you just have to like, if your tool is slightly different, um, either force your tool to be like the tool that they train for.Hmm. Or undo their training for their tool and then Oh, that's re retrain. Yeah. It's, it's really annoying and like,Kyle: I would hope that eventually we hit like a certain level of generality with respect to training newswyx: tools. This is not a GI like, it's, this is a really stupid like. Learn my tool b***h.Like, I don't know if, I don't know if I can say that, but like, you know, um, I think what my point kind of is, is that there's, like, I look at slopes of the scaling laws and like, this slope is not working, man. We, we are at a million token con
AI Reporter Stephanie Palazzolo talks with TITV Host Akash Pasricha about Anthropic's lawsuit against the Pentagon over its supply chain risk designation and how OpenAI's new GPT 5.4 model is landing with developers. We also talk with Anita Ramaswamy about OpenAI's sky‑high IPO valuation, how it compares to Anthropic, Nvidia and Palantir, and why some public investors may sit out the offering. Then we speak with Anissa Gardizy about Oracle and OpenAI's Texas data center twist, Nvidia's $150 million move to take over the site, the upcoming Groq–Nvidia chip reveal at GTC, and Anthropic's aggressive bet on Google TPUs and Fluidstack.Articles discussed on this episode: https://www.theinformation.com/briefings/anthropic-sues-defense-department-designation-supply-chain-riskhttps://www.theinformation.com/newsletters/ai-agenda/ai-agenda-anthropic-strong-legal-case-trumps-dodhttps://www.theinformation.com/articles/openais-ipo-hopes-face-skeptical-investor-communityhttps://www.theinformation.com/newsletters/ai-infrastructure/real-reason-openai-walked-away-oracle-stargate-expansion-abileneSubscribe: YouTube: https://www.youtube.com/@theinformation The Information: https://www.theinformation.com/subscribe_hSign up for the AI Agenda newsletter: https://www.theinformation.com/features/ai-agendaTITV airs weekdays on YouTube, X and LinkedIn at 10AM PT / 1PM ET. Or check us out wherever you get your podcasts.Follow us:X: https://x.com/theinformationIG: https://www.instagram.com/theinformation/TikTok: https://www.tiktok.com/@titv.theinformationLinkedIn: https://www.linkedin.com/company/theinformation/
March 3rd, Computer History Museum CODING AGENTS CONFERENCE, come join us while there are still tickets left.https://luma.com/codingagentsChris Fregly is currently focused on building and scaling high-performance AI systems, writing and teaching about AI infrastructure, helping organizations adopt generative AI and performance engineering principles on AWS, and fostering large developer communities around these topics.Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs // MLOps Podcast #363 with Chris Fregly, Founder, AI Performance Engineer, and InvestorJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide// AbstractIn today's era of massive generative models, it's important to understand the full scope of AI systems' performance engineering. This talk discusses the new O'Reilly book, AI Systems Performance Engineering, and the accompanying GitHub repo (https://github.com/cfregly/ai-performance-engineering). This talk provides engineers, researchers, and developers with a set of actionable optimization strategies. You'll learn techniques to co-design and co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems for both training and inference. // BioChris Fregly is an AI performance engineer and startup founder with experience at AWS, Databricks, and Netflix. He's the author of three (3) O'Reilly books, including Data Science on AWS (2021), Generative AI on AWS (2023), and AI Systems Performance Engineering (2025). He also runs the global AI Performance Engineering meetup and speaks at many AI-related conferences, including Nvidia GTC, ODSC, Big Data London, and more.// Related LinksAI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch 1st Edition by Chris Fregly: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/Coding Agents Conference: https://luma.com/codingagents~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Chris on LinkedIn: /cfreglyTimestamps:[00:00] SageMaker HyperPod Resilience[00:27] Book Creation and Software Engineering[04:57] Software Engineers and Maintenance[11:49] AI Systems Performance Engineering[22:03] Cognitive Biases and Optimization / "Mechanical Sympathy"[29:36] GPU Rack-Scale Architecture[33:58] Data Center Reliability Issues[43:52] AI Compute Platforms[49:05] Hardware vs Ecosystem Choice[1:00:05] Claude vs Codex vs Gemini[1:14:53] Kernel Budget Allocation[1:18:49] Steerable Reasoning Challenges[1:24:18] Data Chain Value Awareness