Podcasts about Inference

  • 583PODCASTS
  • 1,042EPISODES
  • 42mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Mar 19, 2026LATEST
Inference

POPULARITY

20192020202120222023202420252026


Best podcasts about Inference

Show all podcasts related to inference

Latest podcast episodes about Inference

All-In with Chamath, Jason, Sacks & Friedberg
Jensen Huang LIVE: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis

All-In with Chamath, Jason, Sacks & Friedberg

Play Episode Listen Later Mar 19, 2026 66:06


(0:00) Jensen Huang joins the show! (0:26) Acquiring Groq and the inference explosion (8:53) Decision making at the world's most valuable company (10:47) Physical AI's $50T market, OpenClaw's future, the new operating system for modern AI computing (16:38) AI's PR crisis, refuting doomer narratives, Anthropic's comms mistakes (20:48) Revenue capacity, token allocation for employees, Karpathy's autoresearch, agentic future (30:50) Open source, global diffusion, Iran/Taiwan supply chain impact (39:45) Self-driving platform, facing competition from active customers, responding to growth slowdown predictions (47:32) Datacenters in space, AI healthcare, Robotics (56:10) OpenAI/Anthropic revenue potential, how to build an AI moat (59:04) Advice to young people on excelling in the AI era Follow the besties: https://x.com/chamath https://x.com/Jason https://x.com/DavidSacks https://x.com/friedberg Follow on X: https://x.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@theallinpod Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://x.com/yung_spielburg Intro Video Credit: https://x.com/TheZachEffect

Sharp Tech with Ben Thompson
(Preview) OpenAI's Enterprise Pivot, The Rise of Agents and Bubble Counterpoints, Nvidia Changes Its Inference Story

Sharp Tech with Ben Thompson

Play Episode Listen Later Mar 19, 2026 32:50


Ben and Andrew begin with the news that OpenAI is shifting away from “side quests” and allocating resources to the enterprise space, including Dropbox history to explain OpenAI's present, lessons in the enterprise space generally (and what you learn in business school), and OpenAI taking cues from 1980s Microsoft. From there: Talking through Ben's article on Monday, including the implications of agents and questions about integration as durable differentiation for Anthropic and OpenAI. At the end: Nvidia's new messaging on inference chips and Groq integration, and a word about winters (and whiners) in Wisconsin.

IBM Analytics Insights Podcasts
Still Essential: Ruchir Puri, IBM Chief Scientist, on the Death of Prompt Engineering and the Rise of Agentic AI {Replay}

IBM Analytics Insights Podcasts

Play Episode Listen Later Mar 18, 2026 29:40


Send a textFirst Aired Apr 23, 2025If you've been following the AI space lately, this episode hits differently the second time around.When Al sat down with Ruchir Puri — Chief Scientist of IBM Research, IBM Fellow, and the architect behind Watson and watsonx — the conversation covered ground that's only gotten more relevant since: the death of prompt engineering, the rise of agentic AI, and why 2025 was always going to be the year agents broke through in the enterprise.Ruchir doesn't deal in hype. He deals in systems — real ones, running at scale, in industries where a hallucinated number has consequences. In this masterclass, he walks through inference scaling, memory in AI systems, and what it actually means to build AI that's useful rather than just impressive.If you're new to the show, this is the episode to start with. If you've heard it before — trust us, it lands differently now.Key moments:12:21 — Why prompt engineering is already fading (and what replaces it)13:39 — Inference scaling: the frontier that's not about training anymore16:26 — Why AI systems that "forget" are failing us17:56 — The full agentic loop: Think, Plan, Act, Execute, Observe, Reflect23:45 — Why enterprise AI agents are no longer a future stateMaking Data Simple is hosted by Al Martin, WW VP Technical Sales, IBM.Ruchir's LinkedinAl's LinkedInExplore IBM's WatsonxWant to be featured as a guest on Making Data Simple?  Reach out to us at almartintalksdata@gmail.com and tell us why you should be next.  The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. 

WSJ Tech News Briefing
Inside Nvidia's Age of Inference

WSJ Tech News Briefing

Play Episode Listen Later Mar 17, 2026 13:24


Nvidia made its name making chips for training AI models, but a new kind of computing is the talk of the town at the tech powerhouse's annual conference. WSJ's Robbie Whelan explains how the world's biggest company is trying to pivot in the face of inference-mania. Plus, WSJ reporter Kate Clark on how software engineers are faring as (occasionally bossy) bot managers. Katie Deighton hosts. Sign up for the WSJ's free Technology newsletter. Learn more about your ad choices. Visit megaphone.fm/adchoices

WSJ Tech News Briefing
TNB Tech Minute: Amazon and Cerebras Announce Multiyear Inference Chips Partnership

WSJ Tech News Briefing

Play Episode Listen Later Mar 13, 2026 2:54


Plus: Uber is speeding up its rollout of robotaxi services. And EssilorLuxottica's dominance in eyewear could erode amid smart glasses boom. Katherine Sullivan hosts. Learn more about your ad choices. Visit megaphone.fm/adchoices

amazon uber partnership chips inference cerebras multiyear tech minute katherine sullivan
Dev Interrupted
Inference is the new 401k matching and what we're learning from AI-related outages

Dev Interrupted

Play Episode Listen Later Mar 13, 2026 21:49


Are we heading toward a bizarre future where your engineering salary is paid in AI compute tokens instead of cash? Andrew and Ben tackle the latest tech industry shakeups, starting with Meta's acquisition of Moltbook and the controversial idea of making inference limits a core employee benefit. They also break down Charlie Guo's harness engineering playbook, the growing pains behind recent AWS AI-driven outages, and the toxic pressure to constantly run dozens of autonomous agents. Finally, they wrap up by sharing their own agentic weekend projects and debating the catastrophic risks of vibe-coding your laptop's file permissions.Follow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's stories:Silicon Valley is buzzing about this new idea: AI compute as compensationThe Emerging "Harness Engineering" PlaybookMeta acquired Moltbook, the AI agent social network that went viral because of fake posts“A spate of outages, including incidents tied to the use of AI coding tools”, right on scheduleEvery minute you aren't running 69 agents, you are falling behindOFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.

More or Less with the Morins and the Lessins
Anthropic's Bet on Coding Is Working (OpenAI Shopping Pivot, A16Z's Top 50 List, $1B Tennis Channel)

More or Less with the Morins and the Lessins

Play Episode Listen Later Mar 13, 2026 58:42


It's an AI-heavy episode with real stakes: Jessica digs into OpenAI's evolving approach to shopping and why “closing the loop” on commerce could be the proving ground for consumer monetization. The group sparrs over charts: OpenAI vs. Anthropic annualized revenue, what “slope” investors actually care about, and whether Anthropic's developer-first strategy (code, tokens, and high ARPU) is the smarter path than consumer mindshare.Sam argues that “intelligence” is heading toward a global, frictionless commodity market (bad for margins, great for usage) and introduces the idea of “dark pools” (proprietary access/data/relationships) as the only durable moat. Dave counters with the more optimistic take: AI is collapsing the line between “consumer” and “developer,” turning everyone into a builder, and launching a new creative medium (with examples spanning from software to film). Brit adds fuel with “nano-targeted” commerce and a tour through A16Z's Top 50 GenAI web products list, highlighting both mainstream shifts and the internet's… more ‘unexpected' categories.Finally: a truly out-of-left-field deal pitch from Jess: should someone buy the Tennis Channel for ~$1B? Plus a rapid-fire pop culture close (Kelce's return, Oscars bets, and what everyone's watching) before Sam heads back to the sauna.Chapters:0:00 — Intro & Sam's Sauna Hat1:33 — First-Ever MOL Podcast Ad3:54 — ChatGPT's Shopping Pivot7:19 — The Chart: OpenAI vs Anthropic Revenue11:52 — The Slope: Linear or Super Linear?16:10 — Commerce Is Bad. Attention Is Good.19:04 — AI Is Turning Everyone Into a Builder20:15 — "$1B Raised, $900M spent on Inference"23:17 — AI Is Worse Than the Cable Business35:58 — Dark Pools: Death of the Open Marketplace40:45 — The P50 Problem: What Happens to Average People?42:38 — "Software Is Totally Commoditized"45:43 — Brit's Bot Corner: Anime Husband Chatbots 50:44 — Should You Buy the Tennis Channel for $1B?54:22 — Pop Culture CornerWe're also on ↓X: https://twitter.com/moreorlesspodInstagram: https://instagram.com/moreorlessSpotify: https://podcasters.spotify.com/pod/show/moreorlesspodOn demand reactions powered by AI: https://molchat.ai/ Connect with us here:1) Sam Lessin: https://x.com/lessin2) Dave Morin: https://x.com/davemorin3) Jessica Lessin: https://x.com/Jessicalessin4) Brit Morin: https://x.com/brit

Complex Systems with Patrick McKenzie (patio11)
Inference engineering and the real-world deployment of LLMs, with Philip Kiely

Complex Systems with Patrick McKenzie (patio11)

Play Episode Listen Later Mar 12, 2026 83:45


Patrick McKenzie (patio11) and Philip Kiely, early employee at Baseten, discuss the inference stack: the critical layer of software and hardware that sits between a model's weights and a user's prompt. They cover inference engineering, how intermediate layers are evolving over a technical stack that is changing every six months, and how sophisticated organizations are actually consuming LLMs beyond just writing their questions into chatbot apps.–Full transcript available here: www.complexsystemspodcast.com/inference-engineering-with-philip-kiely/–Presenting Sponsors: Mercury, Meter, & GranolaComplex Systems is presented by Mercury—radically better banking for founders. Mercury offers the best wire experience anywhere: fast, reliable, and free for domestic U.S. wires, so you can stay focused on growing your business. Apply online in minutes at mercury.com.Networking infrastructure has a way of accumulating technical debt faster than almost anything else in IT. Meter handles the full stack (wired, wireless, and cellular) as a single integrated solution: designed, deployed, and managed end-to-end so there's only one vendor to call when something goes wrong. Visit meter.com/complexsystems to book a demo. If meetings consistently leave you with hazy action items and lost context, Granola handles the transcription so you can actually participate and gives you searchable notes afterward. Try it free at granola.ai/complexsystems with code COMPLEXSYSTEMS–Links:Download Inference Engineering: https://www.baseten.com/inference-engineering/ Philip's website: https://philipkiely.com/ Stripe's Emily Sands on Complex Systems: https://www.complexsystemspodcast.com/episodes/the-past-present-and-future-of-ai-with-stripe/ Des Traynor on Complex Systems: https://www.complexsystemspodcast.com/episodes/des-traynor/  –Timestamps:(00:00) Intro(00:30) The AI deployment pipeline(03:04) Evolution of abstraction layers in engineering(05:14) Defining inference and model weights(08:45) Architecture of language and diffusion models(10:11) AI adoption in the broader economy(11:30) The shift toward agentic workflows and RL(14:55) Function calling and real-world actions(20:10) Sponsors: Mercury | Meter(22:59) Technologies for agentic tools: MCP and skills(25:32) The craft of writing a harness(29:56) Using AI for automated proofreading and tool creation(34:12) Balancing LLMs with deterministic code(37:31) Observability and chain of thought reasoning(39:31) Sponsor: Granola(41:21) Observability and chain of thought reasoning(50:45) Speculative decoding and hidden states(55:37) The value of smaller, task-specific models(59:55) Internal competencies versus buying solutions(01:09:27) Self-publishing a technical book in record time(01:23:20) Wrap

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 10, 2026 83:37


Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and World's Fair!The definitive AI Accelerator chip company has more than 10xed this AI Summer:And is now a $4.4 trillion megacorp… that is somehow still moving like a startup. We are blessed to have a unique relationship with our first ever NVIDIA guests: Kyle Kranen who gave a great inference keynote at the first World's Fair and is one of the leading architects of NVIDIA Dynamo (a Datacenter scale inference framework supporting SGLang, TRT-LLM, vLLM), and Nader Khalil, a friend of swyx from our days in Celo in The Arena, who has been drawing developers at GTC since before they were even a glimmer in the eye of NVIDIA:Nader discusses how NVIDIA Brev has drastically reduced the barriers to entry for developers to get a top of the line GPU up and running, and Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. We also dive into Jensen's “SOL” (Speed of Light) first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (https://build.nvidia.com), and upcoming Dynamo and agent sessions at GTC.Full Video pod on YouTubeTimestamps00:00 Agent Security Basics00:39 Podcast Welcome and Guests07:19 Acquisition and DevEx Shift13:48 SOL Culture and Dynamo Setup27:38 Why Scale Out Wins29:02 Scale Up Limits Explained30:24 From Laptop to Multi Node33:07 Cost Quality Latency Tradeoffs38:42 Disaggregation Prefill vs Decode41:05 Kubernetes Scaling with Grove43:20 Context Length and Co Design57:34 Security Meets Agents58:01 Agent Permissions Model59:10 Build Nvidia Inference Gateway01:01:52 Hackathons And Autonomy Dreams01:10:26 Local GPUs And Scaling Inference01:15:31 Long Running Agents And SF ReflectionsTranscriptAgent Security BasicsNader: Agents can do three things. They can access your files, they can access the internet, and then now they can write custom code and execute it. You literally only let an agent do two of those three things. If you can access your files and you can write custom code, you don't want internet access because that's one to see full vulnerability, right?If you have access to internet and your file system, you should know the full scope of what that agent's capable of doing. Otherwise, now we can get injected or something that can happen. And so that's a lot of what we've been thinking about is like, you know, how do we both enable this because it's clearly the future.But then also, you know, what, what are these enforcement points that we can start to like protect?swyx: All right.Podcast Welcome and Guestsswyx: Welcome to the Lean Space podcast in the Chromo studio. Welcome to all the guests here. Uh, we are back with our guest host Viu. Welcome. Good to have you back. And our friends, uh, Netter and Kyle from Nvidia. Welcome.Kyle: Yeah, thanks for having us.swyx: Yeah, thank you. Actually, I don't even know your titles.Uh, I know you're like architect something of Dynamo.Kyle: Yeah. I, I'm one of the engineering leaders [00:01:00] and a architects of Dynamo.swyx: And you're director of something and developers, developer tech.Nader: Yeah.swyx: You're the developers, developers, developers guy at nvidia,Nader: open source agent marketing, brev,swyx: and likeNader: Devrel tools and stuff.swyx: Yeah. BeenNader: the focus.swyx: And we're, we're kind of recording this ahead of Nvidia, GTC, which is coming to town, uh, again, uh, or taking over town, uh, which, uh, which we'll all be at. Um, and we'll talk a little bit about your sessions and stuff. Yeah.Nader: We're super excited for it.GTC Booth Stunt Storiesswyx: One of my favorite memories for Nader, like you always do like marketing stunts and like while you were at Rev, you like had this surfboard that you like, went down to GTC with and like, NA Nvidia apparently, like did so much that they bought you.Like what, what was that like? What was that?Nader: Yeah. Yeah, we, we, um. Our logo was a chaka. We, we, uh, we were always just kind of like trying to keep true to who we were. I think, you know, some stuff, startups, you're like trying to pretend that you're a bigger, more mature company than you are. And it was actually Evan Conrad from SF Compute who was just like, you guys are like previousswyx: guest.Yeah.Nader: Amazing. Oh, really? Amazing. Yeah. He was just like, guys, you're two dudes in the room. Why are you [00:02:00] pretending that you're not? Uh, and so then we were like, okay, let's make the logo a shaka. We brought surfboards to our booth to GTC and the energy was great. Yeah. Some palm trees too. They,Kyle: they actually poked out over like the, the walls so you could, you could see the bread booth.Oh, that's so funny. AndNader: no one else,Kyle: just from very far away.Nader: Oh, so you remember it backKyle: then? Yeah I remember it pre-acquisition. I was like, oh, those guys look cool,Nader: dude. That makes sense. ‘cause uh, we, so we signed up really last minute, and so we had the last booth. It was all the way in the corner. And so I was, I was worried that no one was gonna come.So that's why we had like the palm trees. We really came in with the surfboards. We even had one of our investors bring her dog and then she was just like walking the dog around to try to like, bring energy towards our booth. Yeah.swyx: Steph.Kyle: Yeah. Yeah, she's the best,swyx: you know, as a conference organizer, I love that.Right? Like, it's like everyone who sponsors a conference comes, does their booth. They're like, we are changing the future of ai or something, some generic b******t and like, no, like actually try to stand out, make it fun, right? And people still remember it after three years.Nader: Yeah. Yeah. You know what's so funny?I'll, I'll send, I'll give you this clip if you wanna, if you wanna add it [00:03:00] in, but, uh, my wife was at the time fiance, she was in medical school and she came to help us. ‘cause it was like a big moment for us. And so we, we bought this cricket, it's like a vinyl, like a vinyl, uh, printer. ‘cause like, how else are we gonna label the surfboard?So, we got a surfboard, luckily was able to purchase that on the company card. We got a cricket and it was just like fine tuning for enterprises or something like that, that we put on the. On the surfboard and it's 1:00 AM the day before we go to GTC. She's helping me put these like vinyl stickers on.And she goes, you son of, she's like, if you pull this off, you son of a b***h. And so, uh, right. Pretty much after the acquisition, I stitched that with the mag music acquisition. I sent it to our family group chat. Ohswyx: Yeah. No, well, she, she made a good choice there. Was that like basically the origin story for Launchable is that we, it was, and maybe we should explain what Brev is andNader: Yeah.Yeah. Uh, I mean, brev is just, it's a developer tool that makes it really easy to get a GPU. So we connect a bunch of different GPU sources. So the basics of it is like, how quickly can we SSH you into a G, into a GPU and whenever we would talk to users, they wanted A GPU. They wanted an A 100. And if you go to like any cloud [00:04:00] provisioning page, usually it's like three pages of forms or in the forms somewhere there's a dropdown.And in the dropdown there's some weird code that you know to translate to an A 100. And I remember just thinking like. Every time someone says they want an A 100, like the piece of text that they're telling me that they want is like, stuffed away in the corner. Yeah. And so we were like, what if the biggest piece of text was what the user's asking for?And so when you go to Brev, it's just big GPU chips with the type that you want withswyx: beautiful animations that you worked on pre, like pre you can, like, now you can just prompt it. But back in the day. Yeah. Yeah. Those were handcraft, handcrafted artisanal code.Nader: Yeah. I was actually really proud of that because, uh, it was an, i I made it in Figma.Yeah. And then I found, I was like really struggling to figure out how to turn it from like Figma to react. So what it actually is, is just an SVG and I, I have all the styles and so when you change the chip, whether it's like active or not it changes the SVG code and that somehow like renders like, looks like it's animating, but it, we just had the transition slow, but it's just like the, a JavaScript function to change the like underlying SVG.Yeah. And that was how I ended up like figuring out how to move it from from Figma. But yeah, that's Art Artisan. [00:05:00]Kyle: Speaking of marketing stunts though, he actually used those SVGs. Or kind of use those SVGs to make these cards.Nader: Oh yeah. LikeKyle: a GPU gift card Yes. That he handed out everywhere. That was actually my first impression of thatNader: one.Yeah,swyx: yeah, yeah.Nader: Yeah.swyx: I think I still have one of them.Nader: They look great.Kyle: Yeah.Nader: I have a ton of them still actually in our garage, which just, they don't have labels. We should honestly like bring, bring them back. But, um, I found this old printing press here, actually just around the corner on Ven ness. And it's a third generation San Francisco shop.And so I come in an excited startup founder trying to like, and they just have this crazy old machinery and I'm in awe. ‘cause the the whole building is so physical. Like you're seeing these machines, they have like pedals to like move these saws and whatever. I don't know what this machinery is, but I saw all three generations.Like there's like the grandpa, the father and the son, and the son was like, around my age. Well,swyx: it's like a holy, holy trinity.Nader: It's funny because we, so I just took the same SVG and we just like printed it and it's foil printing, so they make a a, a mold. That's like an inverse of like the A 100 and then they put the foil on it [00:06:00] and then they press it into the paper.And I remember once we got them, he was like, Hey, don't forget about us. You know, I guess like early Apple and Cisco's first business cards were all made there. And so he was like, yeah, we, we get like the startup businesses but then as they mature, they kind of go somewhere else. And so I actually, I think we were talking with marketing about like using them for some, we should go back and make some cards.swyx: Yeah, yeah, yeah. You know, I remember, you know, as a very, very small breadth investor, I was like, why are we spending time like, doing these like stunts for GPUs? Like, you know, I think like as a, you know, typical like cloud hard hardware person, you go into an AWS you pick like T five X xl, whatever, and it's just like from a list and you look at the specs like, why animate this GP?And, and I, I do think like it just shows the level of care that goes throughout birth and Yeah. And now, and also the, and,Nader: and Nvidia. I think that's what the, the thing that struck me most when we first came in was like the amount of passion that everyone has. Like, I think, um, you know, you talk to, you talk to Kyle, you talk to, like, every VP that I've met at Nvidia goes so close to the metal.Like, I remember it was almost a year ago, and like my VP asked me, he's like, Hey, [00:07:00] what's cursor? And like, are you using it? And if so, why? Surprised at this, and he downloaded Cursor and he was asking me to help him like, use it. And I thought that was, uh, or like, just show him what he, you know, why we were using it.And so, the amount of care that I think everyone has and the passion, appreciate, passion and appreciation for the moment. Right. This is a very unique time. So it's really cool to see everyone really like, uh, appreciate that.swyx: Yeah.Acquisition and DevEx Shiftswyx: One thing I wanted to do before we move over to sort of like research topics and, uh, the, the stuff that Kyle's working on is just tell the story of the acquisition, right?Like, not many people have been, been through an acquisition with Nvidia. What's it like? Uh, what, yeah, just anything you'd like to say.Nader: It's a crazy experience. I think, uh, you know, we were the thing that was the most exciting for us was. Our goal was just to make it easier for developers.We wanted to find access to GPUs, make it easier to do that. And then all, oh, actually your question about launchable. So launchable was just make one click exper, like one click deploys for any software on top of the GPU. Mm-hmm. And so what we really liked about Nvidia was that it felt like we just got a lot more resources to do all of that.I think, uh, you [00:08:00] know, NVIDIA's goal is to make things as easy for developers as possible. So there was a really nice like synergy there. I think that, you know, when it comes to like an acquisition, I think the amount that the soul of the products align, I think is gonna be. Is going speak to the success of the acquisition.Yeah. And so it in many ways feels like we're home. This is a really great outcome for us. Like we you know, I love brev.nvidia.com. Like you should, you should use it's, it's theKyle: front page for GPUs.Nader: Yeah. Yeah. If you want GP views,Kyle: you go there, getswyx: it there, and it's like internally is growing very quickly.I, I don't remember You said some stats there.Nader: Yeah, yeah, yeah. It's, uh, I, I wish I had the exact numbers, but like internally, externally, it's been growing really quickly. We've been working with a bunch of partners with a bunch of different customers and ISVs, if you have a solution that you want someone that runs on the GPU and you want people to use it quickly, we can bundle it up, uh, in a launchable and make it a one click run.If you're doing things and you want just like a sandbox or something to run on, right. Like open claw. Huge moment. Super exciting. Our, uh, and we'll talk into it more, but. You know, internally, people wanna run this, and you, we know we have to be really careful from the security implications. Do we let this run on the corporate network?Security's guidance was, Hey, [00:09:00] run this on breath, it's in, you know, it's, it's, it's a vm, it's sitting in the cloud, it's off the corporate network. It's isolated. And so that's been our stance internally and externally about how to even run something like open call while we figure out how to run these things securely.But yeah,swyx: I think there's also like, you almost like we're the right team at the right time when Nvidia is starting to invest a lot more in developer experience or whatever you call it. Yeah. Uh, UX or I don't know what you call it, like software. Like obviously NVIDIA is always invested in software, but like, there's like, this is like a different audience.Yeah. It's aNader: widerKyle: developer base.swyx: Yeah. Right.Nader: Yeah. Yeah. You know, it's funny, it's like, it's not, uh,swyx: so like, what, what is it called internally? What, what is this that people should be aware that is going on there?Nader: Uh, what, like developer experienceswyx: or, yeah, yeah. Is it's called just developer experience or is there like a broader strategy hereNader: in Nvidia?Um, Nvidia always wants to make a good developer experience. The thing is and a lot of the technology is just really complicated. Like, it's not, it's uh, you know, I think, um. The thing that's been really growing or the AI's growing is having a huge moment, not [00:10:00] because like, let's say data scientists in 2018, were quiet then and are much louder now.The pie is com, right? There's a whole bunch of new audiences. My mom's wondering what she's doing. My sister's learned, like taught herself how to code. Like the, um, you know, I, I actually think just generally AI's a big equalizer and you're seeing a more like technologically literate society, I guess.Like everyone's, everyone's learning how to code. Uh, there isn't really an excuse for that. And so building a good UX means that you really understand who your end user is. And when your end user becomes such a wide, uh, variety of people, then you have to almost like reinvent the practice, right? Yeah. You haveKyle: to, and actually build more developer ux, right?Because the, there are tiers of developer base that were added. You know, the, the hackers that are building on top of open claw, right? For example, have never used gpu. They don't know what kuda is. They, they, they just want to run something.Nader: Yeah.Kyle: You need new UX that is not just. Hey, you know, how do you program something in Cuda and run it?And then, and then we built, you know, like when Deep Learning was getting big, we built, we built Torch and, and, but so recently the amount of like [00:11:00] layers that are added to that developer stack has just exploded because AI has become ubiquitous. Everyone's using it in different ways. Yeah. It'sNader: moving fast in every direction.Vertical, horizontal.Vibhu: Yeah. You guys, you even take it down to hardware, like the DGX Spark, you know, it's, it's basically the same system as just throwing it up on big GPU cluster.Nader: Yeah, yeah, yeah. It's amazing. Blackwell.swyx: Yeah. Uh, we saw the preview at the last year's GTC and that was one of the better performing, uh, videos so far, and video coverage so far.Awesome. This will beat it. Um,Nader: that wasswyx: actually, we have fingersNader: crossed. Yeah.DGX Spark and Remote AccessNader: Even when Grace Blackwell or when, um, uh, DGX Spark was first coming out getting to be involved in that from the beginning of the developer experience. And it just comes back to what youswyx: were involved.Nader: Yeah. St. St.swyx: Mars.Nader: Yeah. Yeah. I mean from, it was just like, I, I got an email, we just got thrown into the loop and suddenly yeah, I, it was actually really funny ‘cause I'm still pretty fresh from the acquisition and I'm, I'm getting an email from a bunch of the engineering VPs about like, the new hardware, GPU chip, like we're, or not chip, but just GPU system that we're putting out.And I'm like, okay, cool. Matters. Now involved with this for the ux, I'm like. What am I gonna do [00:12:00] here? So, I remember the first meeting, I was just like kind of quiet as I was hearing engineering VPs talk about what this box could be, what it could do, how we should use it. And I remember, uh, one of the first ideas that people were idea was like, oh, the first thing that it was like, I think a quote was like, the first thing someone's gonna wanna do with this is get two of them and run a Kubernetes cluster on top of them.And I was like, oh, I think I know why I'm here. I was like, the first thing we're doing is easy. SSH into the machine. And then, and you know, just kind of like scoping it down of like, once you can do that every, you, like the person who wants to run a Kubernetes cluster onto Sparks has a higher propensity for pain, then, then you know someone who buys it and wants to run open Claw right now, right?If you can make sure that that's as effortless as possible, then the rest becomes easy. So there's a tool called Nvidia Sync. It just makes the SSH connection really simple. So, you know, if you think about it like. If you have a Mac, uh, or a PC or whatever, if you have a laptop and you buy this GPU and you want to use it, you should be able to use it like it's A-A-G-P-U in the cloud, right?Um, but there's all this friction of like, how do you actually get into that? That's part of [00:13:00] Revs value proposition is just, you know, there's a CLI that wraps SSH and makes it simple. And so our goal is just get you into that machine really easily. And one thing we just launched at CES, it's in, it's still in like early access.We're ironing out some kinks, but it should be ready by GTC. You can register your spark on Brev. And so now if youswyx: like remote managed yeah, local hardware. Single pane of glass. Yeah. Yeah. Because Brev can already manage other clouds anyway, right?Vibhu: Yeah, yeah. And you use the spark on Brev as well, right?Nader: Yeah. But yeah, exactly. So, so you, you, so you, you set it up at home you can run the command on it, and then it gets it's essentially it'll appear in your Brev account, and then you can take your laptop to a Starbucks or to a cafe, and you'll continue to use your, you can continue use your spark just like any other cloud node on Brev.Yeah. Yeah. And it's just like a pre-provisioned centerswyx: in yourNader: home. Yeah, exactly.swyx: Yeah. Yeah.Vibhu: Tiny little data center.Nader: Tiny little, the size ofVibhu: your phone.SOL Culture and Dynamo Setupswyx: One more thing before we move on to Kyle. Just have so many Jensen stories and I just love, love mining Jensen stories. Uh, my favorite so far is SOL. Uh, what is, yeah, what is S-O-L-S-O-LNader: is actually, i, I think [00:14:00] of all the lessons I've learned, that one's definitely my favorite.Kyle: It'll always stick with you.Nader: Yeah. Yeah. I, you know, in your startup, everything's existential, right? Like we've, we've run out of money. We were like, on the risk of, of losing payroll, we've had to contract our team because we l ran outta money. And so like, um, because of that you're really always forcing yourself to I to like understand the root cause of everything.If you get a date, if you get a timeline, you know exactly why that date or timeline is there. You're, you're pushing every boundary and like, you're not just say, you're not just accepting like a, a no. Just because. And so as you start to introduce more layers, as you start to become a much larger organization, SOL is is essentially like what is the physics, right?The speed of light moves at a certain speed. So if flight's moving some slower, then you know something's in the way. So before trying to like layer reality back in of like, why can't this be delivered at some date? Let's just understand the physics. What is the theoretical limit to like, uh, how fast this can go?And then start to tell me why. ‘cause otherwise people will start telling you why something can't be done. But actually I think any great leader's goal is just to create urgency. Yeah. [00:15:00] There's an infiniteKyle: create compelling events, right?Nader: Yeah.Kyle: Yeah. So l is a term video is used to instigate a compelling event.You say this is done. How do we get there? What is the minimum? As much as necessary, as little as possible thing that it takes for us to get exactly here and. It helps you just break through a bunch of noise.swyx: Yeah.Kyle: Instantly.swyx: One thing I'm unclear about is, can only Jensen use the SOL card? Like, oh, no, no, no.Not everyone get the b******t out because obviously it's Jensen, but like, can someone else be like, no, likeKyle: frontline engineers use it.Nader: Yeah. Every, I think it's not so much about like, get the b******t out. It's like, it's like, give me the root understanding, right? Like, if you tell me something takes three weeks, it like, well, what's the first principles?Yeah, the first principles. It's like, what's the, what? Like why is it three weeks? What is the actual yeah. What's the actual limit of why this is gonna take three weeks? If you're gonna, if you, if let's say you wanted to buy a new computer and someone told you it's gonna be here in five days, what's the SOL?Well, like the SOL is like, I could walk into a Best Buy and pick it up for you. Right? So then anything that's like beyond that is, and is that practical? Is that how we're gonna, you know, let's say give everyone in the [00:16:00] company a laptop, like obviously not. So then like that's the SOL and then it's like, okay, well if we have to get more than 10, suddenly there might be some, right?And so now we can kind of piece the reality back.swyx: So, so this is the. Paul Graham do things that don't scale. Yeah. And this is also the, what people would now call behi agency. Yeah.Kyle: It's actually really interesting because there's a, there's a second hardware angle to SOL that like doesn't come up for all the org sol is used like culturally at aswyx: media for everything.I'm also mining for like, I think that can be annoying sometimes. And like someone keeps going IOO you and you're like, guys, like we have to be stable. We have to, we to f*****g plan. Yeah.Kyle: It's an interesting balance.Nader: Yeah. I encounter that with like, actually just with, with Alec, right? ‘cause we, we have a new conference so we need to launch, we have, we have goals of what we wanna launch by, uh, by the conference and like, yeah.At the end of the day, where isswyx: this GTC?Nader: Um, well this is like, so we, I mean we did it for CES, we did for GT CDC before that we're doing it for GTC San Jose. So I mean, like every, you know, we have a new moment. Um, and we want to launch something. Yeah. And we want to do so at SOL and that does mean that some, there's some level of prioritization that needs [00:17:00] to happen.And so it, it is difficult, right? I think, um, you have to be careful with what you're pushing. You know, stability is important and that should be factored into S-O-L-S-O-L isn't just like, build everything and let it break, you know, that, that's part of the conversation. So as you're laying, layering in all the details, one of them might be, Hey, we could build this, but then it's not gonna be stable for X, y, z reasons.And so that was like, one of our conversations for CES was, you know, hey, like we, we can get this into early access registering your spark with brev. But there are a lot of things that we need to do in order to feel really comfortable from a security perspective, right? There's a lot of networking involved before we deliver that to users.So it's like, okay. Let's get this to a point where we can at least let people experiment with it. We had it in a booth, we had it in Jensen's keynote, and then let's go iron out all the networking kinks. And that's not easy. And so, uh, that can come later. And so that was the way that we layered that back in.Yeah. ButKyle: It's not really about saying like, you don't have to do the, the maintenance or operational work. It's more about saying, you know, it's kind of like [00:18:00] highlights how progress is incremental, right? Like, what is the minimum thing that we can get to. And then there's SOL for like every component after that.But there's the SOL to get you, get you to the, the starting line. And that, that's usually how it's asked. Yeah. On the other side, you know, like SOL came out of like hardware at Nvidia. Right. So SOL is like literally if we ran the accelerator or the GPU with like at basically full speed with like no other constraints, like how FAST would be able to make a program go.swyx: Yeah. Yeah. Right.Kyle: Soswyx: in, in training that like, you know, then you work back to like some percentage of like MFU for example.Kyle: Yeah, that's a, that's a great example. So like, there's an, there's an S-O-L-M-F-U, and then there's like, you know, what's practically achievable.swyx: Cool. Should we move on to sort of, uh, Kyle's side?Uh, Kyle, you're coming more from the data science world. And, uh, I, I mean I always, whenever, whenever I meet someone who's done working in tabular stuff, graph neural networks, time series, these are basically when I go to new reps, I go to ICML, I walk the back halls. There's always like a small group of graph people.Yes. Absolute small group of tabular people. [00:19:00] And like, there's no one there. And like, it's very like, you know what I mean? Like, yeah, no, like it's, it's important interesting work if you care about solving the problems that they solve.Kyle: Yeah.swyx: But everyone else is just LMS all the time.Kyle: Yeah. I mean it's like, it's like the black hole, right?Has the event horizon reached this yet in nerves? Um,swyx: but like, you know, those are, those are transformers too. Yeah. And, and those are also like interesting things. Anyway, uh, I just wanted to spend a little bit of time on, on those, that background before we go into Dynamo, uh, proper.Kyle: Yeah, sure. I took a different path to Nvidia than that, or I joined six years ago, seven, if you count, when I was an intern.So I joined Nvidia, like right outta college. And the first thing I jumped into was not what I'd done in, during internship, which was like, you know, like some stuff for autonomous vehicles, like heavyweight object detection. I jumped into like, you know, something, I'm like, recommenders, this is popular. Andswyx: yeah, he did RexiKyle: as well.Yeah, Rexi. Yeah. I mean that, that was the taboo data at the time, right? You have tables of like, audience qualities and item qualities, and you're trying to figure out like which member of [00:20:00] the audience matches which item or, or more practically which item matches which member of the audience. And at the time, really it was like we were trying to enable.Uh, recommender, which had historically been like a little bit of a CP based workflow into something that like, ran really well in GPUs. And it's since been done. Like there are a bunch of libraries for Axis that run on GPUs. Uh, the common models like Deeplearning recommendation model, which came outta meta and the wide and deep model, which was used or was released by Google were very accelerated by GPUs using, you know, the fast HBM on the chips, especially to do, you know, vector lookups.But it was very interesting at the time and super, super relevant because like we were starting to get like. This explosion of feeds and things that required rec recommenders to just actively be on all the time. And sort of transitioned that a little bit towards graph neural networks when I discovered them because I was like, okay, you can actually use graphical neural networks to represent like, relationships between people, items, concepts, and that, that interested me.So I jumped into that at [00:21:00] Nvidia and, and got really involved for like two-ish years.swyx: Yeah. Uh, and something I learned from Brian Zaro Yeah. Is that you can just kind of choose your own path in Nvidia.Kyle: Oh my God. Yeah.swyx: Which is not a normal big Corp thing. Yeah. Like you, you have a lane, you stay in your lane.Nader: I think probably the reason why I enjoy being in a, a big company, the mission is the boss probably from a startup guy. Yeah. The missionswyx: is the boss.Nader: Yeah. Uh, it feels like a big game of pickup basketball. Like, you know, if you play one, if you wanna play basketball, you just go up to the court and you're like, Hey look, we're gonna play this game and we need three.Yeah. And you just like find your three. That's honestly for every new initiative that's what it feels like. Yeah.Vibhu: It also like shows, right? Like Nvidia. Just releasing state-of-the-art stuff in every domain. Yeah. Like, okay, you expect foundation models with Nemo tron voice just randomly parakeet.Call parakeet just comes out another one, uh, voice. TheKyle: video voice team has always been producing.Vibhu: Yeah. There's always just every other domain of paper that comes out, dataset that comes out. It's like, I mean, it also stems back to what Nvidia has to do, right? You have to make chips years before they're actually produced.Right? So you need to know, you need to really [00:22:00] focus. TheKyle: design process starts likeVibhu: exactlyKyle: three to five years before the chip gets to the market.Vibhu: Yeah. I, I'm curious more about what that's like, right? So like, you have specialist teams. Is it just like, you know, people find an interest, you go in, you go deep on whatever, and that kind of feeds back into, you know, okay, we, we expect predictions.Like the internals at Nvidia must be crazy. Right? You know? Yeah. Yeah. You know, you, you must. Not even without selling to people, you have your own predictions of where things are going. Yeah. And they're very based, very grounded. Right?Kyle: Yeah. It, it, it's really interesting. So there's like two things that I think that Amed does, which are quite interesting.Uh, one is like, we really index into passion. There's a big. Sort of organizational top sound push to like ensure that people are working on the things that they're passionate about. So if someone proposes something that's interesting, many times they can just email someone like way up the chain that they would find this relevant and say like, Hey, can I go work on this?Nader: It's actually like I worked at a, a big company for a couple years before, uh, starting on my startup journey and like, it felt very weird if you were to like email out of chain, if that makes [00:23:00] sense. Yeah. The emails at Nvidia are like mosh pitsswyx: shoot,Nader: and it's just like 60 people, just whatever. And like they're, there's this,swyx: they got messy like, reply all you,Nader: oh, it's in, it's insane.It's insane. They justKyle: help. You know, Maxim,Nader: the context. But, but that's actually like, I've actually, so this is a weird thing where I used to be like, why would we send emails? We have Slack. I am the entire, I'm the exact opposite. I feel so bad for anyone who's like messaging me on Slack ‘cause I'm so unresponsive.swyx: Your emailNader: Maxi, email Maxim. I'm email maxing Now email is a different, email is perfect because man, we can't work together. I'm email is great, right? Because important threads get bumped back up, right? Yeah, yeah. Um, and so Slack doesn't do that. So I just have like this casino going off on the right or on the left and like, I don't know which thread was from where or what, but like the threads get And then also just like the subject, so you can have like working threads.I think what's difficult is like when you're small, if you're just not 40,000 people I think Slack will work fine, but there's, I don't know what the inflection point is. There is gonna be a point where that becomes really messy and you'll actually prefer having email. ‘cause you can have working threads.You can cc more than nine people in a thread.Kyle: You can fork stuff.Nader: You can [00:24:00] fork stuff, which is super nice and just like y Yeah. And so, but that is part of where you can propose a plan. You can also just. Start, honestly, momentum's the only authority, right? So like, if you can just start, start to make a little bit of progress and show someone something, and then they can try it.That's, I think what's been, you know, I think the most effective way to push anything for forward. And that's both at Nvidia and I think just generally.Kyle: Yeah, there's, there's the other concept that like is explored a lot at Nvidia, which is this idea of a zero billion dollar business. Like market creation is a big thing at Nvidia.Like,swyx: oh, you want to go and start a zero billion dollar business?Kyle: Jensen says, we are completely happy investing in zero billion dollar markets. We don't care if this creates revenue. It's important for us to know about this market. We think it will be important in the future. It can be zero billion dollars for a while.I'm probably minging as words here for, but like, you know, like, I'll give an example. NVIDIA's been working on autonomous driving for a a long time,swyx: like an Nvidia car.Kyle: No, they, they'veVibhu: used the Mercedes, right? They're around the HQ and I think it finally just got licensed out. Now they're starting to be used quite a [00:25:00] bit.For 10 years you've been seeing Mercedes with Nvidia logos driving.Kyle: If you're in like the South San Santa Clara, it's, it's actually from South. Yeah. So, um. Zero billion dollar markets are, are a thing like, you know, Jensen,swyx: I mean, okay, look, cars are not a zero billion dollar market. But yeah, that's a bad example.Nader: I think, I think he's, he's messaging, uh, zero today, but, or even like internally, right? Like, like it's like, uh, an org doesn't have to ruthlessly find revenue very quickly to justify their existence. Right. Like a lot of the important research, a lot of the important technology being developed that, that's kind ofKyle: where research, research is very ide ideologically free at Nvidia.Yeah. Like they can pursue things that they wereswyx: Were you research officially?Kyle: I was never in research. Officially. I was always in engineering. Yeah. We in, I'm in an org called Deep Warning Algorithms, which is basically just how do we make things that are relevant to deep warning go fast.swyx: That sounds freaking cool.Vibhu: And I think a lot of that is underappreciated, right? Like time series. This week Google put out time. FF paper. Yeah. A new time series, paper res. Uh, Symantec, ID [00:26:00] started applying Transformers LMS to Yes. Rec system. Yes. And when you think the scale of companies deploying these right. Amazon recommendations, Google web search, it's like, it's huge scale andKyle: Yeah.Vibhu: You want fast?Kyle: Yeah. Yeah. Yeah. Actually it's, it, I, there's a fun moment that brought me like full circle. Like, uh, Amazon Ads recently gave a talk where they talked about using Dynamo for generative recommendation, which was like super, like weirdly cathartic for me. I'm like, oh my God. I've, I've supplanted what I was working on.Like, I, you're using LMS now to do what I was doing five years ago.swyx: Yeah. Amazing. And let's go right into Dynamo. Uh, maybe introduce Yeah, sure. To the top down and Yeah.Kyle: I think at this point a lot of people are familiar with the term of inference. Like funnily enough, like I went from, you know, inference being like a really niche topic to being something that's like discussed on like normal people's Twitter feeds.It's,Nader: it's on billboardsKyle: here now. Yeah. Very, very strange. Driving, driving, seeing just an inference ad on 1 0 1 inference at scale is becoming a lot more important. Uh, we have these moments like, you know, open claw where you have these [00:27:00] agents that take lots and lots of tokens, but produce, incredible results.There are many different aspects of test time scaling so that, you know, you can use more inference to generate a better result than if you were to use like a short amount of inference. There's reasoning, there's quiring, there's, adding agency to the model, allowing it to call tools and use skills.Dyno sort came about at Nvidia. Because myself and a couple others were, were sort of talking about the, these concepts that like, you know, you have inference engines like VLMS, shelan, tenor, TLM and they have like one single copy. They, they, they sort of think about like things as like one single copy, like one replica, right?Why Scale Out WinsKyle: Like one version of the model. But when you're actually serving things at scale, you can't just scale up that replica because you end up with like performance problems. There's a scaling limit to scaling up replicas. So you actually have to scale out to use a, maybe some Kubernetes type terminology.We kind of realized that there was like. A lot of potential optimization that we could do in scaling out and building systems for data [00:28:00] center scale inference. So Dynamo is this data center scale inference engine that sits on top of the frameworks like VLM Shilling and 10 T lm and just makes things go faster because you can leverage the economy of scale.The fact that you have KV cash, which we can define a little bit later, uh, in all these machines that is like unique and you wanna figure out like the ways to maximize your cash hits or you want to employ new techniques in inference like disaggregation, which Dynamo had introduced to the world in, in, in March, not introduced, it was a academic talk, but beforehand.But we are, you know, one of the first frameworks to start, supporting it. And we wanna like, sort of combine all these techniques into sort of a modular framework that allows you to. Accelerate your inference at scale.Nader: By the way, Kyle and I became friends on my first date, Nvidia, and I always loved, ‘cause like he always teaches meswyx: new things.Yeah. By the way, this is why I wanted to put two of you together. I was like, yeah, this is, this is gonna beKyle: good. It's very, it's very different, you know, like we've, we, we've, we've talked to each other a bunch [00:29:00] actually, you asked like, why, why can't we scale up?Nader: Yeah.Scale Up Limits ExplainedNader: model, you said model replicas.Kyle: Yeah. So you, so scale up means assigning moreswyx: heavier?Kyle: Yeah, heavier. Like making things heavier. Yeah, adding more GPUs. Adding more CPUs. Scale out is just like having a barrier saying, I'm gonna duplicate my representation of the model or a representation of this microservice or something, and I'm gonna like, replicate it Many times.Handle, load. And the reason that you can't scale, scale up, uh, past some points is like, you know, there, there, there are sort of hardware bounds and algorithmic bounds on, on that type of scaling. So I'll give you a good example that's like very trivial. Let's say you're on an H 100. The Maxim ENV link domain for H 100, for most Ds H one hundreds is heus, right?So if you scaled up past that, you're gonna have to figure out ways to handle the fact that now for the GPUs to communicate, you have to do it over Infin band, which is still very fast, but is not as fast as ENV link.swyx: Is it like one order of magnitude, like hundreds or,Kyle: it's about an order of magnitude?Yeah. Okay. Um, soswyx: not terrible.Kyle: [00:30:00] Yeah. I, I need to, I need to remember the, the data sheet here, like, I think it's like about 500 gigabytes. Uh, a second unidirectional for ENV link, and about 50 gigabytes a second unidirectional for Infin Band. I, it, it depends on the, the generation.swyx: I just wanna set this up for people who are not familiar with these kinds of like layers and the trash speedVibhu: and all that.Of course.From Laptop to Multi NodeVibhu: Also, maybe even just going like a few steps back before that, like most people are very familiar with. You see a, you know, you can use on your laptop, whatever these steel viol, lm you can just run inference there. All, there's all, you can, youcan run it on thatVibhu: laptop. You can run on laptop.Then you get to, okay, uh, models got pretty big, right? JLM five, they doubled the size, so mm-hmm. Uh, what do you do when you have to go from, okay, I can get 128 gigs of memory. I can run it on a spark. Then you have to go multi GPU. Yeah. Okay. Multi GPU, there's some support there. Now, if I'm a company and I don't have like.I'm not hiring the best researchers for this. Right. But I need to go [00:31:00] multi-node, right? I have a lot of servers. Okay, now there's efficiency problems, right? You can have multiple eight H 100 nodes, but, you know, is that as a, like, how do you do that efficiently?Kyle: Yeah. How do you like represent them? How do you choose how to represent the model?Yeah, exactly right. That's a, that's like a hard question. Everyone asks, how do you size oh, I wanna run GLM five, which just came out new model. There have been like four of them in the past week, by the way, like a bunch of new models.swyx: You know why? Right? Deep seek.Kyle: No comment. Oh. Yeah, but Ggl, LM five, right?We, we have this, new model. It's, it's like a large size, and you have to figure out how to both scale up and scale out, right? Because you have to find the right representation that you care about. Everyone does this differently. Let's be very clear. Everyone figures this out in their own path.Nader: I feel like a lot of AI or ML even is like, is like this. I think people think, you know, I, I was, there was some tweet a few months ago that was like, why hasn't fine tuning as a service taken off? You know, that might be me. It might have been you. Yeah. But people want it to be such an easy recipe to follow.But even like if you look at an ML model and specificKyle: to you Yeah,Nader: yeah.Kyle: And the [00:32:00] model,Nader: the situation, and there's just so much tinkering, right? Like when you see a model that has however many experts in the ME model, it's like, why that many experts? I don't, they, you know, they tried a bunch of things and that one seemed to do better.I think when it comes to how you're serving inference, you know, you have a bunch of decisions to make and there you can always argue that you can take something and make it more optimal. But I think it's this internal calibration and appetite for continued calibration.Vibhu: Yeah. And that doesn't mean like, you know, people aren't taking a shot at this, like tinker from thinking machines, you know?Yeah. RL as a service. Yeah, totally. It's, it also gets even harder when you try to do big model training, right? We're not the best at training Moes, uh, when they're pre-trained. Like we saw this with LAMA three, right? They're trained in such a sparse way that meta knows there's gonna be a bunch of inference done on these, right?They'll open source it, but it's very trained for what meta infrastructure wants, right? They wanna, they wanna inference it a lot. Now the question to basically think about is, okay, say you wanna serve a chat application, a coding copilot, right? You're doing a layer of rl, you're serving a model for X amount of people.Is it a chat model, a coding model? Dynamo, you know, back to that,Kyle: it's [00:33:00] like, yeah, sorry. So you we, we sort of like jumped off of, you know, jumped, uh, on that topic. Everyone has like, their own, own journey.Cost Quality Latency TradeoffsKyle: And I, I like to think of it as defined by like, what is the model you need? What is the accuracy you need?Actually I talked to NA about this earlier. There's three axes you care about. What is the quality that you're able to produce? So like, are you accurate enough or can you complete the task with enough, performance, high enough performance. Yeah, yeah. Uh, there's cost. Can you serve the model or serve your workflow?Because it's not just the model anymore, it's the workflow. It's the multi turn with an agent cheaply enough. And then can you serve it fast enough? And we're seeing all three of these, like, play out, like we saw, we saw new models from OpenAI that you know, are faster. You have like these new fast versions of models.You can change the amount of thinking to change the amount of quality, right? Produce more tokens, but at a higher cost in a, in a higher latency. And really like when you start this journey of like trying to figure out how you wanna host a model, you, you, you think about three things. What is the model I need to serve?How many times do I need to call it? What is the input sequence link was [00:34:00] the, what does the workflow look like on top of it? What is the SLA, what is the latency SLA that I need to achieve? Because there's usually some, this is usually like a constant, you, you know, the SLA that you need to hit and then like you try and find the lowest cost version that hits all of these constraints.Usually, you know, you, you start with those things and you say you, you kind of do like a bit of experimentation across some common configurations. You change the tensor parallel size, which is a form of parallelismVibhu: I take, it goes even deeper first. Gotta think what model.Kyle: Yes, course,ofKyle: course. It's like, it's like a multi-step design process because as you said, you can, you can choose a smaller model and then do more test time scaling and it'll equate the quality of a larger model because you're doing the test time scaling or you're adding a harness or something.So yes, it, it goes way deeper than that. But from the performance perspective, like once you get to the model you need, you need to host, you look at that and you say, Hey. I have this model, I need to serve it at the speed. What is the right configuration for that?Nader: You guys see the recent, uh, there was a paper I just saw like a few days ago that, uh, if you run [00:35:00] the same prompt twice, you're getting like double Just try itagain.Nader: Yeah, exactly.Vibhu: And you get a lot. Yeah. But the, the key thing there is you give the context of the failed try, right? Yeah. So it takes a shot. And this has been like, you know, basic guidance for quite a while. Just try again. ‘cause you know, trying, just try again. Did you try again? All adviceNader: in life.Vibhu: Just, it's a paper from Google, if I'm not mistaken, right?Yeah,Vibhu: yeah. I think it, it's like a seven bas little short paper. Yeah. Yeah. The title's very cute. And it's just like, yeah, just try again. Give it ask context,Kyle: multi-shot. You just like, say like, hey, like, you know, like take, take a little bit more, take a little bit more information, try and fail. Fail.Vibhu: And that basic concept has gone pretty deep.There's like, um, self distillation, rl where you, you do self distillation, you do rl and you have past failure and you know, that gives some signal so people take, try it again. Not strong enough.swyx: Uh, for, for listeners, uh, who listen to here, uh, vivo actually, and I, and we run a second YouTube channel for our paper club where, oh, that's awesome.Vivo just covered this. Yeah. Awesome. Self desolation and all that's, that's why he, to speed [00:36:00] on it.Nader: I'll to check it out.swyx: Yeah. It, it's just a good practice, like everyone needs, like a paper club where like you just read papers together and the social pressure just kind of forces you to just,Nader: we, we,there'sNader: like a big inference.Kyle: ReadingNader: group at a video. I feel so bad every time. I I, he put it on like, on our, he shared it.swyx: One, one ofNader: your guys,swyx: uh, is, is big in that, I forget es han Yeah, yeah,Kyle: es Han's on my team. Actually. Funny. There's a, there's a, there's a employee transfer between us. Han worked for Nater at Brev, and now he, he's on my team.He wasNader: our head of ai. And then, yeah, once we got in, andswyx: because I'm always looking for like, okay, can, can I start at another podcast that only does that thing? Yeah. And, uh, Esan was like, I was trying to like nudge Esan into like, is there something here? I mean, I don't think there's, there's new infant techniques every day.So it's like, it's likeKyle: you would, you would actually be surprised, um, the amount of blog posts you see. And ifswyx: there's a period where it was like, Medusa hydra, what Eagle, like, youKyle: know, now we have new forms of decode, uh, we have new forms of specula, of decoding or new,swyx: what,Kyle: what are youVibhu: excited? And it's exciting when you guys put out something like Tron.‘cause I remember the paper on this Tron three, [00:37:00] uh, the amount of like post train, the on tokens that the GPU rich can just train on. And it, it was a hybrid state space model, right? Yeah.Kyle: It's co-designed for the hardware.Vibhu: Yeah, go design for the hardware. And one of the things was always, you know, the state space models don't scale as well when you do a conversion or whatever the performance.And you guys are like, no, just keep draining. And Nitron shows a lot of that. Yeah.Nader: Also, something cool about Nitron it was released in layers, if you will, very similar to Dynamo. It's, it's, it's essentially it was released as you can, the pre-training, post-training data sets are released. Yeah. The recipes on how to do it are released.The model itself is released. It's full model. You just benefit from us turning on the GPUs. But there are companies like, uh, ServiceNow took the dataset and they trained their own model and we were super excited and like, you know, celebrated that work.ZoomVibhu: different. Zoom is, zoom is CGI, I think, uh, you know, also just to add like a lot of models don't put out based models and if there's that, why is fine tuning not taken off?You know, you can do your own training. Yeah,Kyle: sure.Vibhu: You guys put out based model, I think you put out everything.Nader: I believe I know [00:38:00]swyx: about base. BasicallyVibhu: without baseswyx: basic can be cancelable.Vibhu: Yeah. Base can be cancelable.swyx: Yeah.Vibhu: Safety training.swyx: Did we get a full picture of dymo? I, I don't know if we, what,Nader: what I'd love is you, you mentioned the three axes like break it down of like, you know, what's prefilled decode and like what are the optimizations that we can get with Dynamo?Kyle: Yeah. That, that's, that's, that's a great point. So to summarize on that three axis problem, right, there are three things that determine whether or not something can be done with inference, cost, quality, latency, right? Dynamo is supposed to be there to provide you like the runtime that allows you to pull levers to, you know, mix it up and move around the parade of frontier or the preto surface that determines is this actually possible with inference And AI todayNader: gives you the knobs.Kyle: Yeah, exactly. It gives you the knobs.Disaggregation Prefill vs DecodeKyle: Uh, and one thing that like we, we use a lot in contemporary inference and is, you know, starting to like pick up from, you know, in, in general knowledge is this co concept of disaggregation. So historically. Models would be hosted with a single inference engine. And that inference engine [00:39:00] would ping pong between two phases.There's prefill where you're reading the sequence generating KV cache, which is basically just a set of vectors that represent the sequence. And then using that KV cache to generate new tokens, which is called Decode. And some brilliant researchers across multiple different papers essentially made the realization that if you separate these two phases, you actually gain some benefits.Those benefits are basically a you don't have to worry about step synchronous scheduling. So the way that an inference engine works is you do one step and then you finish it, and then you schedule, you start scheduling the next step there. It's not like fully asynchronous. And the problem with that is you would have, uh, essentially pre-fill and decode are, are actually very different in terms of both their resource requirements and their sometimes their runtime.So you would have like prefill that would like block decode steps because you, you'd still be pre-filing and you couldn't schedule because you know the step has to end. So you remove that scheduling issue and then you also allow you, or you yourself, to like [00:40:00] split the work into two different ki types of pools.So pre-fill typically, and, and this changes as, as model architecture changes. Pre-fill is, right now, compute bound most of the time with the sequence is sufficiently long. It's compute bound. On the decode side because you're doing a full Passover, all the weights and the entire sequence, every time you do a decode step and you're, you don't have the quadratic computation of KV cache, it's usually memory bound because you're retrieving a linear amount of memory and you're doing a linear amount of compute as opposed to prefill where you retrieve a linear amount of memory and then use a quadratic.You know,Nader: it's funny, someone exo Labs did a really cool demo where for the DGX Spark, which has a lot more compute, you can do the pre the compute hungry prefill on a DG X spark and then do the decode on a, on a Mac. Yeah. And soVibhu: that's faster.Nader: Yeah. Yeah.Kyle: So you could, you can do that. You can do machine strat stratification.Nader: Yeah.Kyle: And like with our future generation generations of hardware, we actually announced, like with Reuben, this [00:41:00] new accelerator that is prefilled specific. It's called Reuben, CPX. SoKubernetes Scaling with GroveNader: I have a question when you do the scale out. Yeah. Is scaling out easier with Dynamo? Because when you need a new node, you can dedicate it to either the Prefill or, uh, decode.Kyle: Yeah. So Dynamo actually has like a, a Kubernetes component in it called Grove that allows you to, to do this like crazy scaling specialization. It has like this hot, it's a representation that, I don't wanna go too deep into Kubernetes here, but there was a previous way that you would like launch multi-node work.Uh, it's called Leader Worker Set. It's in the Kubernetes standard, and Leader worker set is great. It served a lot of people super well for a long period of time. But one of the things that it's struggles with is representing a set of cases where you have a multi-node replica that has a pair, right?You know, prefill and decode, or it's not paired, but it has like a second stage that has a ratio that changes over time. And prefill and decode are like two different things as your workload changes, right? The amount of prefill you'll need to do may change. [00:42:00] The amount of decode that you, you'll need to do might change, right?Like, let's say you start getting like insanely long queries, right? That probably means that your prefill scales like harder because you're hitting these, this quadratic scaling growth.swyx: Yeah.And then for listeners, like prefill will be long input. Decode would be long output, for example, right?Kyle: Yeah. So like decode, decode scale. I mean, decode is funny because the amount of tokens that you produce scales with the output length, but the amount of work that you do per step scales with the amount of tokens in the context.swyx: Yes.Kyle: So both scales with the input and the output.swyx: That's true.Kyle: But on the pre-fold view code side, like if.Suddenly, like the amount of work you're doing on the decode side stays about the same or like scales a little bit, and then the prefilled side like jumps up a lot. You actually don't want that ratio to be the same. You want it to change over time. So Dynamo has a set of components that A, tell you how to scale.It tells you how many prefilled workers and decoded workers you, it thinks you should have, and also provides a scheduling API for Kubernetes that allows you to actually represent and affect this scheduling on, on, on your actual [00:43:00] hardware, on your compute infrastructure.Nader: Not gonna lie. I feel a little embarrassed for being proud of my SVG function earlier.swyx: No, itNader: wasreallyKyle: cute. I, Iswyx: likeNader: it's all,swyx: it's all engineering. It's all engineering. Um, that's where I'mKyle: technical.swyx: One thing I'm, I'm kind of just curious about with all with you see at a systems level, everything going on here. Mm-hmm. And we, you know, we're scaling it up in, in multi, in distributed systems.Context Length and Co Designswyx: Um, I think one thing that's like kind of, of the moment right now is people are asking, is there any SOL sort of upper bounds. In terms of like, let's call, just call it context length for one for of a better word, but you can break it down however you like.Nader: Yeah.swyx: I just think like, well, yeah, I mean, like clearly you can engage in hybrid architectures and throw in some state space models in there.All, all you want, but it looks, still looks very attention heavy.Kyle: Yes. Uh, yeah. Long context is attention heavy. I mean, we have these hybrid models, um,swyx: to take and most, most models like cap out at a million contexts and that's it. Yeah. Like for the last two years has been it.Kyle: Yeah. The model hardware context co-design thing that we're seeing these days is actually super [00:44:00] interesting.It's like my, my passion, like my secret side passion. We see models like Kimmy or G-P-T-O-S-S. I'm use these because I, I know specific things about these models. So Kimmy two comes out, right? And it's an interesting model. It's like, like a deep seek style architecture is MLA. It's basically deep seek, scaled like a little bit differently, um, and obviously trained differently as well.But they, they talked about, why they made the design choices for context. Kimmy has more experts, but fewer attention heads, and I believe a slightly smaller attention, uh, like dimension. But I need to remember, I need to check that. Uh, it doesn't matter. But they discussed this actually at length in a blog post on ji, which is like our pu which is like credit puswyx: Yeah.Kyle: Um, in, in China. Chinese red.swyx: Yeah.Kyle: It's, yeah. So it, it's, it's actually an incredible blog post. Uh, like all the mls people in, in, in that, I've seen that on GPU are like very brilliant, but they, they talk about like the creators of Kimi K two [00:45:00] actually like, talked about it on, on, on there in the blog post.And they say, we, we actually did an experiment, right? Attention scales with the number of heads, obviously. Like if you have 64 heads versus 32 heads, you do half the work of attention. You still scale quadratic, but you do half the work. And they made a, a very specific like. Sort of barter in their system, in their architecture, they basically said, Hey, what if we gave it more experts, so we're gonna use more memory capacity.But we keep the amount of activated experts the same. We increase the expert sparsity, so we have fewer experts act. The ratio to of experts activated to number of experts is smaller, and we decrease the number of attention heads.Vibhu: And kind of for context, what the, what we had been seeing was you make models sparser instead.So no one was really touching heads. You're just having, uh,Kyle: well, they, they did, they implicitly made it sparser.Vibhu: Yeah, yeah. For, for Kimmy. They did,Kyle: yes.Vibhu: They also made it sparser. But basically what we were seeing was people were at the level of, okay, there's a sparsity ratio. You want more total parameters, less active, and that's sparsity.[00:46:00]But what you see from papers, like, the labs like moonshot deep seek, they go to the level of, okay, outside of just number of experts, you can also change how many attention heads and less attention layers. More attention. Layers. Layers, yeah. Yes, yes. So, and that's all basically coming back to, just tied together is like hardware model, co-design, which isKyle: hardware model, co model, context, co-design.Vibhu: Yeah.Kyle: Right. Like if you were training a, a model that was like. Really, really short context, uh, or like really is good at super short context tasks. You may like design it in a way such that like you don't care about attention scaling because it hasn't hit that, like the turning point where like the quadratic curve takes over.Nader: How do you consider attention or context as a separate part of the co-design? Like I would imagine hardware or just how I would've thought of it is like hardware model. Co-design would be hardware model context co-designKyle: because the harness and the context that is produced by the harness is a part of the model.Once it's trained in,Vibhu: like even though towards the end you'll do long context, you're not changing architecture through I see. Training. Yeah.Kyle: I mean you can try.swyx: You're saying [00:47:00] everyone's training the harness into the model.Kyle: I would say to some degree, orswyx: there's co-design for harness. I know there's a small amount, but I feel like not everyone has like gone full send on this.Kyle: I think, I think I think it's important to internalize the harness that you think the model will be running. Running into the model.swyx: Yeah. Interesting. Okay. Bash is like the universal harness,Kyle: right? Like I'll, I'll give. An example here, right? I mean, or just like a, like a, it's easy proof, right? If you can train against a harness and you're using that harness for everything, wouldn't you just train with the harness to ensure that you get the best possible quality out of,swyx: Well, the, uh, I, I can provide a counter argument.Yeah, sure. Which is what you wanna provide a generally useful model for other people to plug into their harnesses, right? So if youKyle: Yeah. Harnesses can be open, open source, right?swyx: Yeah. So I mean, that's, that's effectively what's happening with Codex.Kyle: Yeah.swyx: And, but like you may want like a different search tool and then you may have to name it differently or,Nader: I don't know how much people have pushed on this, but can you.Train a model, would it be, have you have people compared training a model for the for the harness versus [00:48:00] like post training forswyx: I think it's the same thing. It's the same thing. It's okay. Just extra post training. INader: see.swyx: And so, I mean, cognition does this course, it does this where you, you just have to like, if your tool is slightly different, um, either force your tool to be like the tool that they train for.Hmm. Or undo their training for their tool and then Oh, that's re retrain. Yeah. It's, it's really annoying and like,Kyle: I would hope that eventually we hit like a certain level of generality with respect to training newswyx: tools. This is not a GI like, it's, this is a really stupid like. Learn my tool b***h.Like, I don't know if, I don't know if I can say that, but like, you know, um, I think what my point kind of is, is that there's, like, I look at slopes of the scaling laws and like, this slope is not working, man. We, we are at a million token con

The Cloudcast
Understanding NeoClouds with Crusoe

The Cloudcast

Play Episode Listen Later Mar 8, 2026 27:40


Erwan Menard - SVP Product Management @Crusoe talks about… SHOW: 1008SHOW TRANSCRIPT: The Reasoning Show #1008 TranscriptSHOW VIDEO: SPONSORS:VENTION - Ready for expert developers who actually deliver?Visit ventionteams.comSHOW NOTES:Topic 1 - Welcome to the show. Tell us a bit about your background, and what you focus on now at Crusoe. Topic 2 - There has obviously been a lot of coverage of AI data center buildouts all over the world for the last few years. Tell us about Crusoe, and your approach to providing “neocloud” services. Topic 3 - What are the biggest challenges facing Crusoe today and in the immediate future - is it technology, energy, financing for expansions, etc.?Topic 4 - Crusoe started as a bitcoin-focused company and has evolved to more of a GenAI-focus. What types of architectural changes did you have to make for this new type of workload? And how do those impact the quality of the services your customers expect from Crusoe?Topic 5 - Is your focus more on environments to enable model training and customization, or more focus on inference for customer-facing applications? Topic 6 - A lot has changed in AI in the last couple years. What has changed the most in the last couple years, and what are you expecting to change the most over the next couple years? Topic 7 - Sovereign AI and Private AI have become much bigger topics over the last 12-18 months, and we'd expect that to grow. What unique things is Crusoe doing to adapt to these changing requirements from customers?Send a textFEEDBACK? Email: show @ reasoning dot show Bluesky: @reasoningshow.bsky.social Twitter/X: @ReasoningShow Instagram: @reasoningshow TikTok: @reasoningshow

The Tech Blog Writer Podcast
d-Matrix - Ultra-low Latency Batched Inference for Gen AI

The Tech Blog Writer Podcast

Play Episode Listen Later Mar 7, 2026 26:28


What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale? In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day. Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands. During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers. Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy. We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle. The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves. Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments. As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?

AWS Bites
153. LLM Inference with Bedrock

AWS Bites

Play Episode Listen Later Mar 6, 2026 43:25


If you're curious about building with LLMs, but you want to skip the hype and learn what it takes to ship something reliable in production, this episode is for you.We share our real-world experience building AI-powered apps and the gotchas you hit after the demo: tokens and cost, quotas and throttling, IAM and access friction, marketplace subscriptions, and structured outputs that do not break your JSON parser.We focus on Amazon Bedrock as AWS's managed inference layer: how to get started with the current access model, how to choose models, how pricing works, and what to watch for in production.We also go deep on structured outputs: constrained decoding, schema design that improves output quality, and how to avoid “grammar compilation timed out”.In this episode, we mentioned the following resources:fourTheorem: Bedrock structured outputs guide https://fourtheorem.com/amazon-bedrock-structured-outputs/Amazon Bedrock https://aws.amazon.com/bedrock/Bedrock docs https://docs.aws.amazon.com/bedrock/latest/userguide/Bedrock pricing https://aws.amazon.com/bedrock/pricing/Structured outputs https://docs.aws.amazon.com/bedrock/latest/userguide/structured-outputs.htmlCross-region inference https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.htmlQuotas https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.htmlThrottling help https://repost.aws/knowledge-center/bedrock-throttling-errorPrompt caching https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.htmlTroubleshooting error codes https://docs.aws.amazon.com/bedrock/latest/userguide/troubleshooting-api-error-codes.htmlDo you have any AWS questions you would like us to address?Leave a comment here or connect with us on X/Twitter, BlueSky or LinkedIn:- ⁠https://twitter.com/eoins⁠ | ⁠https://bsky.app/profile/eoin.sh⁠ | ⁠https://www.linkedin.com/in/eoins/⁠- ⁠https://twitter.com/loige⁠ | ⁠https://bsky.app/profile/loige.co⁠ | ⁠https://www.linkedin.com/in/lucianomammino/

The Cloudcast
What is an AI Agent?

The Cloudcast

Play Episode Listen Later Mar 4, 2026 26:24


OVERVIEW: Welcome to The Reasoning Show! We dig into one of the foundational building blocks of modern Generative AI, the AI Agent. So what is an AI Agent, and what do we need to think about for the next couple years? SHOW: 1007SHOW TRANSCRIPT: The Reasoning Show #1007 TranscriptSHOW VIDEO: https://www.youtube.com/@TheReasoningShow/SHOW SPONSOR:VENTION - Ready for expert developers who actually deliver? Visit ventionteams.comSHOW NOTES:Topic 1 - We're 3+ years into the Generative AI era. Why do you think AI Agents, or Agentic AI, is now getting so much attention? Topic 2 - If someone asked you to explain what an AI Agent is, how would you do that? Topic 3 - What are some of the core elements of AI Agents that you're seeing impact how people think about and use agents?How to define tasksLanguages and frameworksAbility to orchestrate multiple agentsHuman-in-the-Loop vs. AutonomousOther?Topic 4 - AI Agents are going to spark the great “how much should I pay for this?” discussion. Have you given this any thought yet?  Topic 5 - How do you expect to use AI agents in your day-to-day work, and how do you expect this to impact Enterprise businesses?ESSENTIAL READINGBuilding Effective Agents" by Anthropic: designing agents with tools, memory, reasoning loopsA Comprehensive Review of AI Agents: how agents perceive, reason, decide, actTop 20 AI Agent Concepts You Should Know: covering ReAct, Chain of Thought, memory typesAI Agents in Action: Foundations for Evaluation and Governance: structured foundation for safety aspects of agent deploymentEssential ViewingAI Agentic Design Patterns w/ AutoGen: build autonomous agents that use toolsAI Agent Systems w/ crewAI: orchestrating teams of agentsLangGraph Course: how to build stateful, reliable agentsFrameworks & Tools LangGraph: standard for complex, stateful agents (nodes, edges, loops)CrewAI: Best for structured task delegation and multi-agent collaborationAutoGen: Microsoft's framework for multi-agent conversational systemsSend a textFEEDBACK? Email: show @ reasoning dot show Bluesky: @reasoningshow.bsky.social Twitter/X: @ReasoningShow Instagram: @reasoningshow TikTok: @reasoningshow

Stocks for Beginners
The AI Compute Shift: Why Inference Could Change Everything for Nvidia, Intel & AMD

Stocks for Beginners

Play Episode Listen Later Mar 4, 2026 47:44


Dive into the heart of the AI revolution with Gary Brode from Deep Knowledge Investing. In this episode, we unravel the complex world of the semiconductors that power AI. Nvidia's GPU dominance to ARM-based innovations, Intel and AMD's CPU roles, and the massive energy demands of data centres. Learn about key deals like Nvidia-Meta's collaboration, investment risks in hyperscalers, and opportunities in nuclear energy and uranium. Perfect for investors navigating the AI boom.

Shares for Beginners
The AI Compute Shift: Why Inference Could Change Everything for Nvidia, Intel & AMD

Shares for Beginners

Play Episode Listen Later Mar 4, 2026 47:46


Dive into the heart of the AI revolution with Gary Brode from Deep Knowledge Investing. In this episode, we unravel the complex world of the semiconductors that power AI. Nvidia's GPU dominance to ARM-based innovations, Intel and AMD's CPU roles, and the massive energy demands of data centres. Learn about key deals like Nvidia-Meta's collaboration, investment risks in hyperscalers, and opportunities in nuclear energy and uranium. Perfect for investors navigating the AI boom.

The Tech Blog Writer Podcast
From Core To Edge: Akamai On Where AI Inference Must Live Next

The Tech Blog Writer Podcast

Play Episode Listen Later Mar 3, 2026 27:40


What if the real AI race in 2026 isn't about building bigger models, but about where decisions are made, how fast they happen, and whether they deliver measurable value? In this episode, I'm joined by John Bradshaw, Director of Cloud Computing Technology and Strategy at Akamai, to unpack his predictions for the next phase of cloud, AI inference, and the economics that will shape enterprise technology over the next 12 months. As organizations move beyond experimentation, John explains why the boardroom conversation has shifted from capability to return on investment, and how spiraling compute demands are forcing leaders to rethink the balance between performance, cost, and innovation. We explore why this new financial scrutiny is not slowing AI adoption, but refining it. John shares how inefficient GPU workflows, centralized inference, and poorly aligned architectures are being challenged by a more disciplined approach that pushes intelligence closer to the edge. This shift is not only about latency and performance. It is about building scalable, value-driven platforms that can support real-time decision-making, agentic workloads, and global user experiences without breaking traditional IT budgets. Trust is another major theme throughout our conversation. From the rise of everyday AI agents that quietly handle routine tasks to the growing importance of secure, resilient inference pipelines, John outlines how low-latency edge infrastructure, local processing, and hybrid cloud models will redefine reliability for both enterprises and consumers. We also discuss the smart home backlash following recent outages, and why the next generation of connected products will be designed to work even when the network does not. The episode also looks at the future of streaming, where consolidation, intelligent content delivery, and AI-driven personalization are reshaping both the user experience and the economics behind the platforms. Behind the scenes, orchestration is emerging as a defining capability, with multiple models and services working together to validate outputs, reduce hallucinations, and create more dependable AI systems. This is a conversation about moving from possibility to production, from experimentation to accountability, and from centralized architectures to distributed intelligence. So as AI becomes embedded in every workflow and every customer interaction, will the winners be the companies with the biggest models, or the ones that know exactly where their AI should live, how it should be orchestrated, and how it proves its value every single day?

The Deductionist Podcast
The Inference Cycle: How to Think Like an Elite Investigator

The Deductionist Podcast

Play Episode Listen Later Feb 27, 2026 25:13


Most people don't investigate. They react. In this episode, we break down the Inference Cycle, the psychological defence system elite investigators use to prevent confirmation bias, emotional reasoning, and premature certainty. From early inquisitorial systems to Joseph Bell (the real-life inspiration for Sherlock Holmes), we explore how structured reasoning replaced accusation, and why that matters now more than ever. You'll learn: • Why suspicion is not a verdict• How to build falsifiable hypotheses• The danger of narrative seduction• Why evidence must be designed before it's collected• How cognitive dissonance corrupts smart people• The psychological discipline Sherlock Holmes actually represents This is not about memorizing facts. It's about training your character to tolerate ambiguity. As Holmes said: “It is a capital mistake to theorize before one has data.” If you want sharper thinking, better judgment, and intellectual humility under pressure, this episode is for you. Access the free tier or go deeper with exclusive paid challenges: https://www.omniscient-insights.com/axiom https://www.omniscient-insights.com/community-home MERCH -- https://the-deductionist.myspreadshop.co.uk/all E-SCAPE GAME -- https://www.youtube.com/@thedeductionistteam Everything else you need -- https://linktr.ee/bencardall Music provided by https://robertjohncollinsmusic.com/` #sherlock #deduction #mystery

JSA Podcasts for Telecom and Data Centers
Kansas' First Neutral IX + AI Inference at the Edge | Connected Nation IXP at MetroConnect 2026

JSA Podcasts for Telecom and Data Centers

Play Episode Listen Later Feb 25, 2026 8:05


The Cloud Pod
344: Amazon's Coding Bot Bites the Hand That Runs It

The Cloud Pod

Play Episode Listen Later Feb 24, 2026 61:30


Welcome to episode 344 of The Cloud Pod, where the forecast is always cloudy! Justin is out of the office at a World of Warcraft Tournament (not really), and Ryan is pursuing his lifelong dream of becoming a roadie for The Eagles (maybe?), so it's Jonathan and Matt holding down the fort this week, and they've got a ton of cloud news for you! From security to AI assistants, we've got all the news you need. Let's get started!  Titles we almost went with this week Zero Bus, All Gas, No Kafka Brakes AI Coding Bot Bites the Hand That Runs It When Your Robot Developer Goes Rogue on AWS Kubernetes VPA Finally Stops Evicting Your Database Pods Google Trains 100 Million People, Still No One Reads the Docs  MCP Walks Into a Bar Not Enterprise Ready Yet No More Pod Evictions Kubernetes 1.35 Scales In Place No Keys No Drama Just IAM and Cloud SQL One Agent to Rule Them All in Kubernetes IAM Tired of Writing Policies Manually When Your AI Coding Tool Has Delete Permissions One Dashboard to Rule All Your GPU Clusters Serverless Reservations Prove Nothing Is Truly Free Range Kiro Takes the Wheel on AWS IAM Policies Stop Blaming Backups for Your Bad Architecture AI Agent Goes Rogue, Takes AWS Down With It Everything is Bigger in Texas Except the Water Usage OpenAI launches the college basketball of Inference. Pro service – low cost General News  1:05 Code Mode: give agents an entire API in 1,000 tokens Cloudflare‘s Code Mode MCP server reduces token consumption by 99.9% compared to a traditional MCP implementation, exposing the entire Cloudflare API (over 2,500 endpoints) through just two tools, search() and execute(), using roughly 1,000 tokens versus 1.17 million for a conventional approach. The architecture works by having the AI agent write JavaScript code against a typed OpenAPI spec representation, rather than loading tool definitions into context, with code executing inside a sandboxed V8 isolate (Dynamic Worker) that restricts file system access, environment variables, and external fetches by default. This approach addresses a fundamental constraint in agentic AI systems: adding more tools to give agents broader capabilities directly competes with the available context space for the task at hand. 01:41 Jonathan- “It's good. I'm not sure I could imagine 2 ½ thousand MCP tool definitions in a context window and still actually use it for anything.”    AI Is Going Great – Or How ML Makes Money  03:58 OpenClaw creator Peter Steinberger joins OpenAI Peter Steinberger, creator of viral AI assistant OpenClaw (formerly Clawdbot/Moltbot), has joined

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Play Episode Listen Later Feb 22, 2026 55:29


Olive Song from MiniMax shares how her team trains the M series frontier open-weight models using reinforcement learning, tight product feedback loops, and systematic environment perturbations. This crossover episode weaves together her AI Engineer Conference talk and an in-depth interview from the Inference podcast. Listeners will learn about interleaved thinking for long-horizon agentic tasks, fighting reward hacking, and why they moved RL training to FP32 precision. Olive also offers a candid look at debugging real-world LLM failures and how MiniMax uses AI agents to track the fast-moving AI landscape. Use the Granola Recipe Nathan relies on to identify blind spots across conversations, AI research, and decisions: https://bit.ly/granolablindspot LINKS: Conference Talk (AI Engineer, Dec 2025) – https://www.youtube.com/watch?v=lY1iFbDPRlwInterview (Turing Post, Jan 2026) – https://www.youtube.com/watch?v=GkUMqWeHn40 Sponsors: Claude: Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro's full capabilities at https://claude.ai/tcr Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai CHAPTERS: (00:00) About the Episode (04:15) Minimax M2 presentation (Part 1) (17:59) Sponsors: Claude | Tasklet (21:22) Minimax M2 presentation (Part 2) (21:26) Research life and culture (26:27) Alignment, safety and feedback (32:01) Long-horizon coding agents (35:57) Open models and evaluation (43:29) M2.2 and researcher goals (48:16) Continual learning and AGI (52:58) Closing musical summary (55:49) Outro PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

DH Unplugged
DHUnplugged #791: AI Overload

DH Unplugged

Play Episode Listen Later Feb 18, 2026 70:35


Self Created Valuation Boosts Apple Announces new Podcast push AI – A breakdown Playing them like a fiddle – Warner Brothers PLUS we are now on Spotify and Amazon Music/Podcasts! Click HERE for Show Notes and Links DHUnplugged is now streaming live - with listener chat. Click on link on the right sidebar. Love the Show? Then how about a Donation? Follow John C. Dvorak on Twitter Follow Andrew Horowitz on Twitter Warm-Up - A NEW CTP just announced - China releasing new AI models - AI - A breakdown - we are on overload - Big Employment news.... Markets - Self Created Valuation Boosts - Apple Announces new Podcast push - Playing them like a fiddle - Warner Brothers Quick Note - Going to rip up the playbook on something this week on TDI Podcast. Anyone who owns an annuity should listen to what is about to come on next Sundays show.....  No Agenda... Olympics - Anything to discuss? MONEY FOR ALL - The average tax refund is 10.9% higher so far this season, compared to about the same point in 2025, according to early filing data from the IRS. - The 2026 tax season opened Jan. 26, and the average refund amount was $2,290 as of Feb. 6, up from $2,065 about one year prior, the IRS reported Friday night. - As of Feb. 6, the total amount refunded was more than $16.9 billion, up 1.9% compared to last year, according to the IRS release. That figure reflects current-year returns only. - This is partly because there were excess-witholdings from last year on the rules changed and paycheck withholdings were not adjusted. This is a one time situation.. Emplyment - 4.3% - "Better" than expected payrolls number - A major revision was released last Wednesday. Overall 2025 job growth was much weaker than initially reported. The total net change for the full year 2025 was revised down from +584,000 jobs to just +181,000 jobs (seasonally adjusted) — an average of only about 15,000 jobs added per month instead of ~49,000. This made 2025 one of the weakest years for job creation in recent non-recession periods. - Employment levels were consistently overstated throughout 2025 by roughly 800,000 to over 1 million jobs, peaking around mid-year. For example: By March 2025, the level was revised down by 898,000. By December 2025 (preliminary), down by 1,029,000. - Monthly changes were also adjusted downward in most cases (e.g., August's originally reported -26,000 became a larger loss of -70,000; September's +108,000 became +76,000). - The revisions reflect normal annual benchmarking, but this one was unusually large (larger than the typical 0.2% average over the prior decade), likely due to factors like overestimation of business births or other data mismatches. - In short, the data reveals that the U.S. labor market in 2025 was significantly softer than the monthly headlines suggested at the time — job growth was overstated by a substantial margin, painting a picture of a much weaker employment picture for the year. AI Updates - While U.S. markets have been focused on the impact of Anthropic and Altruist's tools on software and financial services, China's tech giants have released AI models this week that have shown advancements in robotics and video generation. - Google is reporting that China's AI models are just MONTHS behind western models - However - is this progress? In a video demo, Alibaba showed a robot with pincers for hands that appeared to be able to count oranges, pick them up and place them in a basket. It was also shown taking milk out of a fridge. - Alibaba on Monday unveiled a new artificial intelligence model Qwen 3.5 designed to execute complex tasks independently, with big improvements in performance and cost that the Chinese tech giant claims beat major U.S. rival models on several benchmarks. - Zhipu AI — which trades as Knowledge Atlas Technology in Hong Kong said the model approaches Anthropic's Claude Opus 4.5 in coding benchmarks while surpassing Google's Gemini 3 Pro on some tests. - Shares of MiniMax also jumped Thursday after it launched its updated M2.5 open-source model with enhanced AI agent tools. Grok Update - Grok, Elon Musk's AI chatbot, has been gaining ground in the U.S. over the past months, data showed, even as it draws global censure and regulatory scrutiny after being used to generate a wave of non-consensual sexualized images of women and minors. - U.S. market share of the tool rose to 17.8% last month from 14% in December, and 1.9% in January 2025, according to data from research firm Apptopia. - Men are still the largest % users of Grok ~ 78% (down from 89% in April 2025) AI Market Share - ChatGPT's share slumped to 52.9% last month from 80.9% in January last year, while Gemini's grew to 29.4% from 17.3% over the same period. AI Market Share InfoGrapic and AI Understanding - Have we gone through this? - At its core, AI is technology that lets machines perform tasks that normally require human intelligence — things like understanding language, recognizing images, making decisions, or solving problems. - Modern AI (especially since ~2022) is dominated by machine learning — systems that learn patterns from huge amounts of data instead of being explicitly programmed rule-by-rule. - Inference is the "using" or "applying" phase of AI — when a trained model takes new input and produces an output / prediction / answer. Contrast with training (the "learning" phase): ------ Training ? Like a student studying for years: very compute-heavy, expensive, done once (or rarely) on massive servers/GPUs, adjusts billions of parameters based on examples. ------ Inference ? Like the student taking a test or doing their job: much faster, cheaper, runs on your phone/laptop/cloud, uses the fixed knowledge from training to respond instantly. - gentic AI takes regular AI (like chat models) to the next level: instead of just answering questions or generating text, these systems act autonomously to achieve goals with minimal human help. "Agentic" comes from "agency" — the ability to make decisions, plan, use tools, take actions, adapt, and even learn from results — like a smart digital employee rather than just a smart answer machine. AI Infographic Last AI Item - A shortage of memory chips is hammering profits, derailing corporate plans, and inflating price tags on various products, with the crunch expected to get worse. - The fundamental reason for the squeeze is the buildout of AI data centers, with companies like Alphabet and OpenAI buying up large shares of memory chip production, leaving consumer electronics producers fighting over a dwindling supply. - The resulting price spikes are causing concern, with some warning of "RAMmageddon" and others predicting that memory chip prices will go "parabolic", bringing lavish profits to some companies but painful prices to the rest of the electronics sector. Here is something: - Gallup will no longer track presidential approval ratings after nearly 90 years - Founded by George Gallup in 1935, the Washington, DC-based management company began tracking the president's job performance 88 years ago. - Gallup told USA TODAY it will no longer publish "favorability ratings of political figures," a decision it said "reflects an evolution in how Gallup focuses its public research and thought leadership." - Gallup said the ratings are now "widely produced, aggregated and interpreted, and no longer represent an area where Gallup can make its most distinctive contribution." - "Our commitment is to long-term, methodologically sound research on issues and conditions that shape people's lives," the company wrote, adding that its work will continue through the Gallup Poll Social Series, the Gallup Quarterly Business Review, the World Poll and more. - Seems like they are unable to SHAPE opinion due to social media etc.....? Apple Podcast Update - Big news! - Apple on Monday announced that it will bring a new integrated video podcast experience to Apple Podcasts this spring. - The move comes as video viewership continues to reshape podcasting. About 37% of people over age 12 watch video podcasts monthly, according to Edison Research. - The update brings Apple Podcasts more in-line with its competitors Spotify, YouTube and now Netflix, which have increasingly leaned into video podcasting. -“Twenty years ago, Apple helped take podcasting mainstream by adding podcasts to iTunes, and more than a decade ago, we introduced the dedicated Apple Podcasts app,” said Eddy Cue, Apple's senior vice president of Services, in a statement. “ - By bringing a category-leading video experience to Apple Podcasts, we're putting creators in full control of their content and how they build their businesses, while making it easier than ever for audiences to listen to or watch podcasts.” M&A - Texas Instruments Inc. has reached an agreement to buy Silicon Laboratories Inc. for about $7.5 billion, deepening its exposure to several markets for chips. - Silicon Labs investors will receive $231 in cash for each share of the company's common stock and the transaction is expected to close in the first half of 2027. - The transaction still needs to win approval by investors in Silicon Labs and shares of Silicon Labs surged by 51% to $206.48 after the announcement. Inflation - This helps - PepsiCo, will cut prices on core brands such as Lay's and Doritos by up to 15% following a consumer backlash against several previous price hikes, the snacks and beverage maker said on Tuesday after it topped fourth-quarter results. Miran - Moving - Federal Reserve Governor Stephen Miran is leaving his post as chair of the Council of Economic Advisers, CNBC has confirmed. - He joined the CEA in January 2025, but had been on leave from that post since last September when he filled the unexpired term of former Fed Governor Adriana Kugler.- He reamins on Fed board No Biggie???? - There are some astonishing cased being reported of Bad AI in the operating room - JNJ's TruDi Navigation System - Since AI was added to the device, the FDA has received unconfirmed reports of at least 100 malfunctions and adverse events. - At least 10 people were injured between late 2021 and November 2025, according to the reports. Most allegedly involved errors in which the TruDi Navigation System misinformed surgeons about the location of their instruments while they were using them inside patients' heads during operations. - Cerebrospinal fluid reportedly leaked from one patient's nose. In another reported case, a surgeon mistakenly punctured the base of a patient's skull. In two other cases, patients each allegedly suffered strokes after a major artery was accidentally injured. Cuba - The main airport has putt out a bulletin that they are out of Jet Fuel - Blackouts and lack of other fuels are creating big problems - No airlines have stopped running at this point, but many will as they cannot refuel - This is a bigger problem for cargo planes (supplies) that may not be able to risk flying to Cuba as they will not be able to get out. Dalio Warning -  Legendary investor Ray Dalio said on Tuesday the world was “on the brink” of a capital war. - He said central banks and sovereign wealth funds were already preparing for measures like foreign exchange and capital controls. - "When money is weaponized using measures like trade embargoes, blocking access to capital markets, or using ownership of debt as leverage." - “Capital, money, matters,” Dalio said Tuesday. “We're seeing capital controls … taking place all over the world today, and who will experience that is questionable. So, we are on the brink — that doesn't mean we are in [a capital war now], but it means that it's a logical concern.” - Could this be why gold and siver are being hoarded (physical assets over digital currency? - Is China's edict to banks to diversify away from US Treasuries a sign? Self Boosted Valuation - Waymo is aiming to raise about $16 billion in a financing-round that would value it at nearly $110 billion, Bloomberg News reported, citing people familiar with the matter. - Alphabet would provide about $13 billion to the autonomous driving firm while the rest would come from investors including Sequoia Capital, DST Global and Dragoneer Investment Group, the report added. - Soooooo - Waymo is a unit of Alphabet.... Alphabet providing 80% of the funding that boosts valuations..... Hmmmmmmmm Warner Brothers -  Warner Bros Discovery Inc is considering reopening sale talks with Paramount Skydance Corp after receiving its amended offer. - The Warner Bros board is discussing whether Paramount could offer a path to a superior deal, which may ignite a second bidding war with Netflix Inc. - Paramount submitted amended terms that addressed several concerns, including covering a fee owed to Netflix and offering to backstop a Warner Bros debt refinancing. Economics Coming Up - Short Week - plenty of Reports - Wednesday - Durable Goods, Housing Starts, Industrial Production, FOMC Minutes - Thursday - Philly Fed, Initial Claims - Friday: PCE, Personal Income and Spending, GDP for Q4 (3.6%) ----- New Home Sales, UMich Feb Final   Love the Show? Then how about a Donation? ANNOUNCING THE THE CLOSEST TO THE PIN for CATERPILLAR Winners will be getting great stuff like the new "OFFICIAL" DHUnplugged Shirt!     FED AND CRYPTO LIMERICKS   See this week's stock picks HERE Follow John C. Dvorak on Twitter Follow Andrew Horowitz on Twitter

Scientific Sense ®
Prof. Andrew Jaffe of Imperial College on the Random Universe

Scientific Sense ®

Play Episode Listen Later Feb 14, 2026 66:04


Scientific Sense ® by Gill Eapen: Prof. Andrew Jaffe is professor of astrophysics and cosmology at Imperial College, London. He is the Director of the Imperial Centre for Inference and Cosmology. He studies the history and evolution of the Universe as a whole.Please subscribe to this channel:https://www.youtube.com/c/ScientificSense?sub_confirmation=1

Azeem Azhar's Exponential View
Inside the economics of OpenAI (exclusive research)

Azeem Azhar's Exponential View

Play Episode Listen Later Feb 13, 2026 49:46


Welcome to Exponential View, the show where I explore how exponential technologies such as AI are reshaping our future. I've been studying AI and exponential technologies at the frontier for over ten years. Each week, I share some of my analysis or speak with an expert guest to make light of a particular topic. To keep up with the Exponential transition, subscribe to this channel or to my newsletter: https://www.exponentialview.co/ ----In this episode, I'm joined by Jaime Sevilla, founder of Epoch AI; Hannah Petrovic from my team at Exponential View; and financial journalist Matt Robinson from AI Street. Together we investigate a fundamental question: do the economics of AI companies actually work? We analysed OpenAI's financials from public data to examine whether their revenues can sustain the staggering R&D costs of frontier models. The findings reveal a picture far more precarious than many assume; we also explore where the real infrastructure bottlenecks lie, why compute demand will dwarf energy constraints, and what the rise of long-running agentic workloads means for the entire industry. Read the study here: https://www.exponentialview.co/p/inside-openais-unit-economics-epoch-exponentialviewWe covered: (00:00) Do the economics of frontier AI actually work? (02:48) Piecing together OpenAI's finances from public data (05:24) GPT-5's "rapidly depreciating asset" problem (13:25) Why OpenAI is flirting with ads (17:31) If you were Sam Altman, what would you do differently? (22:54) Energy vs. GPUs; where the real infrastructure bottleneck lies (29:15) What surging compute demand actually looks like (33:12) The most surprising finding from the research (38:02) The race to avoid commoditization (43:35) Agents that outlive their models  Where to find me: Exponential View newsletter: https://www.exponentialview.co/ Website: https://www.azeemazhar.com/ LinkedIn: https://www.linkedin.com/in/azhar/ Twitter/X: https://x.com/azeem  Where to find Jamie: https://epoch.ai or https://epochai.substack.com Where to find Matt: https://www.ai-street.co  Production by supermix.io and EPIIPLUS1 Production and research: Chantal Smith and Marija Gavrilov. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

The Dissenter
#1215 Mauricio Suárez - Inference and Representation: A Study in Modeling Science

The Dissenter

Play Episode Listen Later Feb 13, 2026 52:40


******Support the channel******Patreon: https://www.patreon.com/thedissenterPayPal: paypal.me/thedissenterPayPal Subscription 1 Dollar: https://tinyurl.com/yb3acuuyPayPal Subscription 3 Dollars: https://tinyurl.com/ybn6bg9lPayPal Subscription 5 Dollars: https://tinyurl.com/ycmr9gpzPayPal Subscription 10 Dollars: https://tinyurl.com/y9r3fc9mPayPal Subscription 20 Dollars: https://tinyurl.com/y95uvkao ******Follow me on******Website: https://www.thedissenter.net/The Dissenter Goodreads list: https://shorturl.at/7BMoBFacebook: https://www.facebook.com/thedissenteryt/Twitter: https://x.com/TheDissenterYT This show is sponsored by Enlites, Learning & Development done differently. Check the website here: http://enlites.com/ Dr. Mauricio Suárez is Full Professor (catedrático) in Logic and Philosophy of Science at Universidad Complutense de Madrid. He is also a life member at Clare Hall at Cambridge University. His main research interests lie in the philosophy of probability and causality, the history and philosophy of science (mainly physics, chemistry and biology), modeling and idealization, the aesthetics of scientific representation, and general epistemology and methodology of science. He is the author of Inference and Representation: A Study in Modeling Science. In this episode, we focus on Inference and Representation. We start by talking about modeling in science. We then explore the concept of representation. We talk about the flaws of reductive naturalist theories of scientific representation, and an inferential conception of scientific representation. Finally, we discuss how our exploration of scientific representation connects to debates on artistic representation.--A HUGE THANK YOU TO MY PATRONS/SUPPORTERS: PER HELGE LARSEN, JERRY MULLER, BERNARDO SEIXAS, ADAM KESSEL, MATTHEW WHITINGBIRD, ARNAUD WOLFF, TIM HOLLOSY, HENRIK AHLENIUS, ROBERT WINDHAGER, RUI INACIO, ZOOP, MARCO NEVES, COLIN HOLBROOK, PHIL KAVANAGH, SAMUEL ANDREEFF, FRANCIS FORDE, TIAGO NUNES, FERGAL CUSSEN, HAL HERZOG, NUNO MACHADO, JONATHAN LEIBRANT, JOÃO LINHARES, STANTON T, SAMUEL CORREA, ERIK HAINES, MARK SMITH, JOÃO EIRA, TOM HUMMEL, SARDUS FRANCE, DAVID SLOAN WILSON, YACILA DEZA-ARAUJO, ROMAIN ROCH, YANICK PUNTER, CHARLOTTE BLEASE, NICOLE BARBARO, ADAM HUNT, PAWEL OSTASZEWSKI, NELLEKE BAK, GUY MADISON, GARY G HELLMANN, SAIMA AFZAL, ADRIAN JAEGGI, PAULO TOLENTINO, JOÃO BARBOSA, JULIAN PRICE, HEDIN BRØNNER, FRANCA BORTOLOTTI, GABRIEL PONS CORTÈS, URSULA LITZCKE, SCOTT, ZACHARY FISH, TIM DUFFY, SUNNY SMITH, JON WISMAN, WILLIAM BUCKNER, LUKE GLOWACKI, GEORGIOS THEOPHANOUS, CHRIS WILLIAMSON, PETER WOLOSZYN, DAVID WILLIAMS, DIOGO COSTA, ALEX CHAU, CORALIE CHEVALLIER, BANGALORE ATHEISTS, LARRY D. LEE JR., OLD HERRINGBONE, MICHAEL BAILEY, DAN SPERBER, ROBERT GRESSIS, JEFF MCMAHAN, JAKE ZUEHL, MARK CAMPBELL, TOMAS DAUBNER, LUKE NISSEN, KIMBERLY JOHNSON, JESSICA NOWICKI, LINDA BRANDIN, VALENTIN STEINMANN, ALEXANDER HUBBARD, BR, JONAS HERTNER, URSULA GOODENOUGH, DAVID PINSOF, SEAN NELSON, MIKE LAVIGNE, JOS KNECHT, LUCY, MANVIR SINGH, PETRA WEIMANN, CAROLA FEEST, MAURO JÚNIOR, 航 豊川, TONY BARRETT, NIKOLAI VISHNEVSKY, STEVEN GANGESTAD, TED FARRIS, HUGO B., JAMES, JORDAN MANSFIELD, CHARLOTTE ALLEN, PETER STOYKO, DAVID TONNER, LEE BECK, PATRICK DALTON-HOLMES, NICK KRASNEY, RACHEL ZAK, AND DENNIS XAVIER!A SPECIAL THANKS TO MY PRODUCERS, YZAR WEHBE, JIM FRANK, ŁUKASZ STAFINIAK, TOM VANEGDOM, BERNARD HUGUENEY, CURTIS DIXON, BENEDIKT MUELLER, THOMAS TRUMBLE, KATHRINE AND PATRICK TOBIN, JONCARLO MONTENEGRO, NICK GOLDEN, CHRISTINE GLASS, IGOR NIKIFOROVSKI, PER KRAULIS, AND JOSHUA WOOD!AND TO MY EXECUTIVE PRODUCERS, MATTHEW LAVENDER, SERGIU CODREANU, ROSEY, AND GREGORY HASTINGS!

TechCrunch Startups – Spoken Edition
Didero lands $30M to put manufacturing procurement on ‘agentic' autopilot; plus, AI inference startup Modal Labs in talks to raise at $2.5B valuation, sources say

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Feb 13, 2026 6:37


Didero functions as an agentic AI layer that sits on top of a company's existing ERP, acting as a coordinator that reads incoming communications and automatically executes the necessary updates and tasks. Also, General Catalyst is in talks to lead Modal Labs' next round for the four-year-old startup, according to our sources. Learn more about your ad choices. Visit podcastchoices.com/adchoices

The Data Exchange with Ben Lorica
Breaking the Memory Wall in the Age of Inference

The Data Exchange with Ben Lorica

Play Episode Listen Later Feb 12, 2026 45:43


In this episode, Sid Sheth, founder and CEO of d-matrix, discusses the company's approach to AI inference hardware with a focus on solving the memory bottleneck problem. Subscribe to the Gradient Flow Newsletter

Learning Bayesian Statistics
151 Diffusion Models in Python, a Live Demo with Jonas Arruda

Learning Bayesian Statistics

Play Episode Listen Later Feb 12, 2026 95:43


• Support & get perks!• Proudly sponsored by PyMC Labs! Get in touch at alex.andorra@pymc-labs.com• Intro to Bayes and Advanced Regression courses (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work !Chapters:00:00 Exploring Generative AI and Scientific Modeling10:27 Understanding Simulation-Based Inference (SBI) and Its Applications15:59 Diffusion Models in Simulation-Based Inference19:22 Live Coding Session: Implementing Baseflow for SBI34:39 Analyzing Results and Diagnostics in Simulation-Based Inference46:18 Hierarchical Models and Amortized Bayesian Inference48:14 Understanding Simulation-Based Inference (SBI) and Its Importance49:14 Diving into Diffusion Models: Basics and Mechanisms50:38 Forward and Backward Processes in Diffusion Models53:03 Learning the Score: Training Diffusion Models54:57 Inference with Diffusion Models: The Reverse Process57:36 Exploring Variants: Flow Matching and Consistency Models01:01:43 Benchmarking Different Models for Simulation-Based Inference01:06:41 Hierarchical Models and Their Applications in Inference01:14:25 Intervening in the Inference Process: Adding Constraints01:25:35 Summary of Key Concepts and Future DirectionsThank you to my Patrons for making this episode possible!Links from the show:- Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026!- Jonas's Diffusion for SBI Tutorial & Review (Paper & Code)- The BayesFlow Library- Jonas on LinkedIn- Jonas on GitHub- Further reading for more mathematical details: Holderrieth & Erives- 150 Fast Bayesian Deep Learning, with David Rügamer, Emanuel Sommer & Jakob Robnik- 107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt

Code Story
The Gene Simmons of Data Protection - AI Inference-time Guardrails

Code Story

Play Episode Listen Later Feb 11, 2026 26:44


The Gene Simmons of Data Protection: Protegrity's KISS MethodToday, we are releasing our final FINAL episode from our series, entitled The Gene Simmons of Data Protection - the KISS Method, brought to you by none other than Protegrity. Protegrity is AI-powered data security for data consumption, offering fine grain data protection solutions, so you can enable your data security, compliance, sharing and analytics.Episode Title: Navigating the Future of Data Management: Type Systems, Quantum Computing, and Protegrity's InnovationsIn our final-FINAL episode, we are speaking with Ave Gatton, Director of Generative AI. We talk about how AI safety doesn't end with training, it begins with inference. We explore the overlooked frontier of AI security, from prompt-injection, data leakage, and model manipulation. Ave helps to understand how you can build guardrails that operate in real time, and adapt to evolving threats.QuestionsWhat are inference-time threats and why are they becoming a critical focus in AI security? How do inference-time risks differ from training-time risks? Why is inference-time protection critical for safe, scalable AI adoption? How do inference-time threats vary across industries? Is there any industry where these attacks are most prevalent? Why are traditional security models insufficient at inference? What is the impact of inference-time breaches on AI adoption? What role does compliance play in shaping inference-time guardrails?What practical steps can organizations take to secure inference today? How can businesses balance performance with security when adding guardrails? Linkshttps://www.protegrity.com/https://www.linkedin.com/in/averell-gatton/Support this podcast at — https://redcircle.com/code-story-insights-from-startup-tech-leaders/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy

The Six Five with Patrick Moorhead and Daniel Newman
The Six Five Pod | EP 291: Davos to Abu Dhabi - Inference, Codex & the So-Called SaaSpocalypse

The Six Five with Patrick Moorhead and Daniel Newman

Play Episode Listen Later Feb 10, 2026 54:58


The Six Five Pod is back with Episode 291. Daniel Newman and Patrick Moorhead are fresh off trips to Davos and Abu Dhabi, where they've explored the full AI stack up close (models, infrastructure, healthcare/genomics). This episode dives into what really matters right now in the markets and tech. From Microsoft's Maia 200 inference push, to NVIDIA's $2B CoreWeave bet, OpenAI's Codex closing the coding gap, the "SaaSpocalypse" panic, Cisco's AI Summit, and a no-BS debate on whether AI agents are actually enterprise-ready. The handpicked topics for this week are: Inside Abu Dhabi's Full-Stack AI Play: From universities to healthcare to hyperscale infrastructure — Pat shares a firsthand perspective on how the UAE is quietly building an end-to-end AI ecosystem. Optics, Cooling, and the Hidden AI Infrastructure Layer: Why companies like Coherent matter as much as GPUs — and how photonics, co-packaged optics, and rack-level cooling are becoming critical to scaling AI factories. Inference Takes Center Stage: Microsoft's Maia 200 shows real progress — and why hyperscalers are building custom silicon to boost capacity, economics, and control. NVIDIA's $2B CoreWeave Bet Circular finance or strategic genius? We unpack what NVIDIA's latest investment signals about AI factories, cloud capacity, and long-term infrastructure buildout. Codex vs. Claude: The Coding Wars Heat Up: OpenAI closes the gap fast — and developers start hopping between tools as AI coding becomes a moving target. The "SaaSpocalypse" Narrative: Is software really dead? We separate market panic from reality — and explain why SaaS won't disappear, but will never be valued the same again. Cisco's AI Summit Reality Check: From hype to execution: what stood out from Cisco's AI Summit and why networking, security, and enterprise integration matter more than demos. Are AI Agents Enterprise-Ready? The Flip Debates: real-world workflows vs. reliability, governance, and security — where agents work today, and where they still fall short. Big Tech Earnings Whiplash: AWS, Google, Microsoft, Meta, NVIDIA, AMD, Palantir, and Coherent — massive CapEx, cloud acceleration, and what Wall Street is getting wrong about AI ROI.   Be sure to subscribe to The Six Five Pod so you never miss an episode.

Telugu Bytes
087 - Tsun"AI" Warning

Telugu Bytes

Play Episode Listen Later Feb 9, 2026 103:42


Dhruv and Ravi are back to talk about the rise of agentic AI — their experience with Claude Code and Cursor, what agents actually are, and why they think a tsunami is coming for software engineers and knowledge workers. The Tsunami Warning The feeling since late 2025 — prapancham roju roju ki maripotundi The COVID masks analogy — we are those people now Why the folks back home aren't feeling it yet Timeline — How We Got Here GPT-2 (2020) → ChatGPT (2022) → Cursor (2023) → Claude Code & Opus 4.5 (2025) The Evolution of AI Coding Chat interface — copy-paste snippets from ChatGPT Assisted coding — Cursor tab-complete, you drive, model navigates Agentic coding — the agent drives, you're the passenger Cursor vs Claude Code — why Claude Code wins The Autopilot vs FSD analogy WTF is a Model? Giant N-dimensional matrices with weights Text in, everything out Bigger model, better responses WTF is an Agent? Model = brain, Agent = human Agent uses the model to operate tools — like a robot with a task Inference and Context Engineering Sessions, prompting, context windows SWE = Context Engineering + Verification Engineering Memory, Skills, and the Matrix Kung-Fu analogy Agent Harnesses Claude Code, Cursor, Agent SDKs Programming in English It's fun, addictive, and an art Communication skills over coding skills Good taste, strong architecture, trash your prior beliefs My Thesis — And How It Was Wrong Thought it'd hit "IT workers" first, not Big Tech But the tsunami hits the coast first — US and Big Tech have closed loops Tesla car Dharavi slums lo nadavadhu — we paved 6-lane roads for AI Knowledge Work, Manufacturing and Farming Any work where you can "close the loop" is at risk Manufacturing with QC — robots were always there, programming them was hard Farming — mostly done What is Still Scarce? Ideas, customer acquisition, creative content, land Creating software is no longer scarce Ippudu Em Cheyyamantaru Saar? We don't need SWEs, we need builders Product sense, distributed systems, build-sell-ship quickly The existential dread — we don't have 10 years, or 5, or even 2 Collective mental health crisis and economic reshaping ahead The fire storm is coming

Telecom Reseller
Blaize and Nokia Target Real-World Edge AI with Hybrid Inference for APAC, Podcast

Telecom Reseller

Play Episode Listen Later Feb 9, 2026


Doug Green, Publisher of Technology Reseller News, spoke with Dinakar Munagala, CEO & Co- Dinakar Munagala Founder of Blaize, and Joseph Sulistyo, SVP of Corporate Marketing, about Blaize's push to make AI inference practical outside the data center—and why a new strategic collaboration with Nokia is designed to accelerate that shift, especially across Asia Pacific. Blaize positions itself as an AI computing company built around a purpose-built, fully programmable processor architecture it calls a graph streaming processor, paired with software intended to simplify development of “real-world” AI. Munagala framed the company's focus as practical AI inference for environments like smart factories, smart cities, agriculture, defense, and other edge and hybrid deployments where latency, power, thermal limits, and operating conditions are non-negotiable. A centerpiece of the discussion was Blaize's announcement that Nokia is strengthening edge AI capabilities through a strategic collaboration with Blaize to deliver hybrid inference solutions across APAC. Munagala and Sulistyo described the move as a signal that AI's next phase isn't only about large-scale training in centralized data centers, but about deploying inference where outcomes are realized—near cameras, sensors, machines, and field infrastructure. In their view, Nokia's global reach in networking, automation, and integration creates a path to deliver end-to-end solutions that combine connectivity and compute for real deployments, not demos. Sulistyo emphasized the economics driving hybrid inference: cost-sensitive, power-constrained environments often cannot justify a single “monolithic” compute approach. Instead, he argued, the market is moving toward heterogeneous architectures—mixing different compute types to hit performance targets while controlling total cost of ownership. In APAC, he noted, the scale of deployments makes marginal savings meaningful, and hybrid designs become an operational requirement, not a preference. The conversation also connected edge inference to public-sector and community outcomes. Both executives highlighted smart-city use cases—such as traffic management, tolling, and first-responder automation—where real-time inference can improve accuracy and responsiveness while reducing labor-intensive processes. They extended that point to rural and underserved regions, arguing that “smart city” also includes municipalities and regional governments, where automation and analytics can unlock revenue (e.g., tolls and fines) while improving safety. Doug pushed on definitions and practicality, prompting Munagala to describe edge inference as compute performed as close as possible to the sensor—for example, processing video near a camera mounted on a pole, at a toll booth, or in a factory—so systems can detect events and respond with low latency. He added that some deployments may route inference to nearby on-prem servers or regional data centers, depending on architecture and proximity, and Blaize aims to support these variations with a common hardware/software platform. Blaize also addressed the “AI energy speed bump” impacting communities and operators—particularly where power availability and cost are constrained. Munagala said low power is foundational to Blaize's design goals and argued that purpose-built inference architectures can reduce the burden associated with power-hungry AI approaches. Sulistyo added that the broader infrastructure conversation increasingly includes cooling realities (air and liquid) and the need to match the deployment environment to the right compute profile. To ground “real-world AI” in examples, the guests pointed to deployments including license plate recognition in complex, variable conditions and traffic anomaly detection (identifying behavior that deviates from normal flow). They described these as compute-intensive workloads that must run reliably outdoors and under harsh conditions, where latency and endurance matter as much as accuracy. They also discussed retail analytics as another example of edge inference delivering measurable business outcomes by connecting what happens in-store to revenue-driving decisions. Looking ahead, Munagala described the Nokia collaboration as a model for additional partnerships that bring inference solutions into production environments at scale. Sulistyo noted APAC is the initial focus, with other regions expected to follow based on demand, proof points, and the prioritization of specific use cases. To learn more about Blaize and its technology, visit https://www.blaize.com/.

Lucretius Today -  Epicurus and Epicurean Philosophy
Episode 319 - Is the Key To Happiness Found In Supernatural Causes and Geometry?

Lucretius Today - Epicurus and Epicurean Philosophy

Play Episode Listen Later Feb 6, 2026 46:37 Transcription Available


Welcome to Episode 319 of Lucretius Today. This is a podcast dedicated to the poet Lucretius, who wrote "On The Nature of Things," the most complete presentation of Epicurean philosophy left to us from the ancient world. Each week we walk you through the Epicurean texts, and we discuss how Epicurean philosophy can apply to you today. If you find the Epicurean worldview attractive, we invite you to join us in the study of Epicurus at EpicureanFriends.com, where we discuss this and all of our podcast episodes.       Last week we completed our series on Cicero's "Tusculan Disputations," and this week we start a new series that will help us with canonics / epistemology. We will eventually move to Philodemus' "On Signs" / "On Methods of Inference," and when we do we will refer to David Sedley's article on "On Signs," and the appendix in the translation prepared by Philip Lacey, both of which are very good but difficult.To get us acclimated to the issues, we need a little more Cicero from his work "Academic Questions." This is much shorter than On Ends and Tusculan Disputations but gives us an overview of the issues that split Plato's Academy and shows how Aristotle and the Stoics (and Epicurus) responded to those controversies.https://www.epicureanfriends.com/thread/4922-episode-319-is-the-secret-to-happiness-found-in-supernatural-causes-and-geometry/

Infinite Machine Learning
Building a $4 Billion AI Infra Company | Benny Chen, cofounder of Fireworks AI

Infinite Machine Learning

Play Episode Listen Later Feb 6, 2026 41:45


Benny Chen is the cofounder of Fireworks AI, an AI infrastructure platform. They have raised $327M in funding from Benchmark, Sequoia, Lightspeed, Index, and others. Benny's favorite book: Principles (Author: Ray Dalio)(00:01) Intro and why AI infrastructure is having a moment(00:06) Training vs inference: what's working and where the real bottlenecks are(01:25) Why inference is the hard problem in production(03:30) What breaks at scale when AI systems hit real users(05:29) GPUs, hardware constraints, and why power is now a first-class concern(06:02) What you're actually paying for in inference(07:21) Reliability, compliance, and enterprise expectations(09:49) Training and inference capacity: when they blur together(11:06) How to make inference fast in practice(13:06) System design choices behind modern inference platforms(15:28) Inference economics and cost tradeoffs(18:02) When fine-tuning actually makes sense(21:58) What “best model” really means for real companies(24:25) Production LLM architectures that actually work(27:46) Building an AI infra company customers can trust(29:27) Shipping fast without breaking reliability(31:14) Go-to-market lessons for infra startups(34:17) Where inference platforms are heading next(36:32) Rapid fire round--------Where to find Benny Chen: LinkedIn: https://www.linkedin.com/in/benny-yufei-chen-2238575a/--------Where to find Prateek Joshi: Website: https://prateekj.com Research Column: https://www.infrastartups.comLinkedIn: https://www.linkedin.com/in/prateek-joshi-infiniteX: https://x.com/prateekj

Top Traders Unplugged
UGO09: Playing the Players in a Narrative Market ft. Ben Hunt

Top Traders Unplugged

Play Episode Listen Later Feb 4, 2026 61:00 Transcription Available


Cem Karsan sits down with Ben Hunt, founder of Epsilon Theory, to explore how narratives shape markets, politics, and decision making itself. Drawing on decades of experience across academia, hedge funds, and applied AI, Ben explains why stories, not data, increasingly drive outcomes in modern markets. The conversation spans unstructured data, inference, common knowledge, and the mechanics of narrative momentum. Together, they examine consumer expectations, inflation silence, geopolitical signaling, and the slow shift away from US dominance. What emerges is a framework for understanding markets as reflexive systems, where perception often matters more than reality.-----50 YEARS OF TREND FOLLOWING BOOK AND BEHIND-THE-SCENES VIDEO FOR ACCREDITED INVESTORS - CLICK HERE-----Follow Niels on Twitter, LinkedIn, YouTube or via the TTU website.IT's TRUE ? – most CIO's read 50+ books each year – get your FREE copy of the Ultimate Guide to the Best Investment Books ever written here.And you can get a free copy of my latest book “Ten Reasons to Add Trend Following to Your Portfolio” here.Learn more about the Trend Barometer here.Send your questions to info@toptradersunplugged.comAnd please share this episode with a like-minded friend and leave an honest Rating & Review on iTunes or Spotify so more people can discover the podcast.Follow Cem on Twitter.Episode TimeStamps: 00:00 - Introduction to U Got Options and the trading floor setting02:18 - Ben Hunt's background and Epsilon Theory origins04:11 - Markets as the ultimate multiplayer game06:15 - Inference, unstructured data, and narrative analysis08:18 - Why sentiment and word counts miss the real signal11:16 - Mapping meaning and truthy stories15:00 - LLMs as operating systems, not oracles18:01 - Giving money back and when models stop working21:16 - Applying narrative tools beyond markets24:10 - Consumer weakness versus bullish expectations30:43 - Inflation, recession, and why markets do not care33:29 - Dormant stories and volatility discovery34:26 -

Mind-Body Solution with Dr Tevin Naidu
What is Ultimately Real? Consciousness, Free Energy & Spacetime | Donald Hoffman & Karl Friston

Mind-Body Solution with Dr Tevin Naidu

Play Episode Listen Later Feb 4, 2026 160:24


In this landmark Mind-Body Solution Colloquia, cognitive scientist Donald Hoffman and neuroscientist Karl Friston engage in a deep, rigorous dialogue on the foundations of reality, perception, and consciousness.Hoffman argues that spacetime and physical objects are not fundamental, but evolved interfaces shaped by fitness rather than truth. Friston presents the Free Energy Principle and Active Inference as a unifying framework for life, mind, and meaning — raising the question of whether inference itself can ground reality.Together, they explore:- Why spacetime may be derived, not fundamental- Whether consciousness must come before physics- Markov blankets, trace logic, and system boundaries- Probability, inference, and non-equilibrium dynamics- The limits of scientific explanation- Implications for AI, evolution, and ontologyThis is not a debate — it is a serious attempt to understand reality at its deepest level.TIMESTAMPS:(00:00) - What is Ultimately Real? Consciousness vs Physicalism Debate(00:51) - Why Consciousness is Fundamental Beyond Spacetime(03:06) - High Energy Physics: Spacetime is Doomed Explained(05:06 - Challenges of Physicalist Theories in Explaining Consciousness(07:11 - Ontological Views: Free Energy Principle Integration(08:20) - Background-Free Explanations of Lived Experience(10:06) - Parsimony and Data Compression in Scientific Models(12:21) - Discoveries in Simpler Scattering Amplitude Solutions(14:09) - Free Energy Principle Guiding Beyond Spacetime Physics(16:06) - Why Physicalism Fails to Boot Up Consciousness(19:05) - Probability Theory's Role in Consciousness Frameworks(26:05) - Trace Logic Applied to Markov Chains Dynamics(34:51) - Markov Blankets and Insulation from the Past(39:07) - Minimizing Surprise in Non-Equilibrium Processes(53:32) - Spacetime as a Derived Projection from Fundamentals(1:04:15) - Constructing Simpler Explanations of Reality(1:20:50) - State Spaces and Dimensionality in Consciousness(1:41:30) - Non-Unique Bounds in AI Design Using Trace Logic(2:02:00) - From Classical Probability to Quantum Mechanics Transition(2:10:26) - Inferring Hidden Realities Through Relationships(2:18:54) - Time as a Computational Resource in Inference(2:24:09) - Scope and Limits of Scientific Explanations(2:32:32) - Agreements on Constructed Realities and Perceptions(2:40:01) - Closing Thoughts: Joint ManifestoEPISODE LINKS:- Karl's Round 1: https://youtu.be/Kb5X8xOWgpc- Karl's Round 2: https://youtu.be/mqzyKs2Qvug- Karl's Round 3 (Ft Mark Solms): https://youtu.be/Jtp426wQ-JI- Karl's Lecture 1: https://youtu.be/Gp9Sqvx4H7w- Karl's Lecture 2: https://youtu.be/Sfjw41TBnRM- Karl's Lecture 3: https://youtu.be/dM3YINvDZsY- Don's Round 1: https://youtu.be/M5Hz1giUUT8- Don's Round 2: https://youtu.be/Toq9YLl49KM- Don's Round 3: https://youtu.be/QRa8r5xOaAA- Don's Round 4: https://youtu.be/Hf1q-bZMEo4- Don's Lecture 1: https://youtu.be/r_UFm8GbSvU- Don's Lecture 2: https://youtu.be/YBmzqNIlbcICONNECT:- Website: https://mindbodysolution.org - YouTube: https://youtube.com/@MindBodySolution- Podcast: https://creators.spotify.com/pod/show/mindbodysolution- Twitter: https://twitter.com/drtevinnaidu- Facebook: https://facebook.com/drtevinnaidu - Instagram: https://instagram.com/drtevinnaidu- LinkedIn: https://linkedin.com/in/drtevinnaidu- Website: https://tevinnaidu.com=============================Disclaimer: The information provided on this channel is for educational purposes only. The content is shared in the spirit of open discourse and does not constitute, nor does it substitute, professional or medical advice. We do not accept any liability for any loss or damage incurred from you acting or not acting as a result of listening/watching any of our contents. You acknowledge that you use the information provided at your own risk. Listeners/viewers are advised to conduct their own research and consult with their own experts in the respective fields.

Effective Altruism Forum Podcast
[Linkpost] “Inference Scaling Reshapes AI Governance” by Toby_Ord

Effective Altruism Forum Podcast

Play Episode Listen Later Feb 2, 2026 34:49


This is a link post. The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. Rapid scaling of inference-at-deployment would: lower the importance of open-weight models (and of securing the weights of closed models), reduce the impact of the first human-level models, change the business model for frontier AI, reduce the need for power-intense data centres, and derail the current paradigm of AI governance via training compute thresholds. Rapid scaling of inference-during-training would have more ambiguous effects that range from a revitalisation of pre-training scaling to a form of recursive self-improvement via iterated distillation and amplification. The end of an era — for both training and governance The intense year-on-year scaling up of AI training runs has been one of the most dramatic and stable markers of the Large Language Model era. Indeed it had been widely taken to be a permanent fixture of the AI landscape and the basis of many approaches to [...] ---Outline:(01:06) The end of an era -- for both training and governance(05:24) Scaling inference-at-deployment(06:42) Reducing the number of simultaneously served copies of each new model(08:45) Reducing the value of securing model weights(09:30) Reducing the benefits and risks of open-weight models(10:05) Unequal performance for different tasks and for different users(12:08) Changing the business model and industry structure(12:50) Reducing the need for monolithic data centres(17:16) Scaling inference-during-training(28:07) Conclusions(30:17) Appendix. Comparing the costs of scaling pre-training vs inference-at-deployment --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/RnsgMzsnXcceFfKip/inference-scaling-reshapes-ai-governance Linkpost URL:https://www.tobyord.com/writing/inference-scaling-reshapes-ai-governance --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Effective Altruism Forum Podcast
[Linkpost] “Inference Scaling and the Log-x Chart” by Toby_Ord

Effective Altruism Forum Podcast

Play Episode Listen Later Feb 2, 2026 16:32


This is a link post. Improving model performance by scaling up inference compute is the next big thing in frontier AI. But the charts being used to trumpet this new paradigm can be misleading. While they initially appear to show steady scaling and impressive performance for models like o1 and o3, they really show poor scaling (characteristic of brute force) and little evidence of improvement between o1 and o3. I explore how to interpret these new charts and what evidence for strong scaling and progress would look like. From scaling training to scaling inference The dominant trend in frontier AI over the last few years has been the rapid scale-up of training — using more and more compute to produce smarter and smarter models. Since GPT-4, this kind of scaling has run into challenges, so we haven't yet seen models much larger than GPT-4. But we have seen a recent shift towards scaling up the compute used during deployment (aka 'test-time compute' or ‘inference compute'), with more inference compute producing smarter models. You could think of this as a change in strategy from improving the quality of your employees' work via giving them more years of training in which acquire [...] --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/zNymXezwySidkeRun/inference-scaling-and-the-log-x-chart Linkpost URL:https://www.tobyord.com/writing/inference-scaling-and-the-log-x-chart --- Narrated by TYPE III AUDIO. ---Images from the article:

Effective Altruism Forum Podcast
[Linkpost] “Evidence that Recent AI Gains are Mostly from Inference-Scaling” by Toby_Ord

Effective Altruism Forum Podcast

Play Episode Listen Later Feb 2, 2026 10:01


This is a link post. In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction (pre-training) stalled out. Since late 2024, we've seen a new trend of using reinforcement learning (RL) in the second stage of training (post-training). Through RL, the AI models learn to do superior chain-of-thought reasoning about the problem they are being asked to solve. This new era involves scaling up two kinds of compute: the amount of compute used in RL post-training the amount of compute used every time the model answers a question Industry insiders are excited about the first new kind of scaling, because the amount of compute needed for RL post-training started off being small compared to the tremendous amounts already used in next-token prediction pre-training. Thus, one could scale the RL post-training up by a factor of 10 or 100 before even doubling the total compute used to train the model. But the second new kind of scaling is a problem. Major AI companies were already starting to spend more compute serving their models to customers than in the training [...] --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/5zfubGrJnBuR5toiK/evidence-that-recent-ai-gains-are-mostly-from-inference Linkpost URL:https://www.tobyord.com/writing/mostly-inference-scaling --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

TD Ameritrade Network
Nitin Sacheti's Market Picks: Inference, Fiber, Defense, Energy

TD Ameritrade Network

Play Episode Listen Later Jan 30, 2026 6:17


“Expectations are really high,” Nitin Sacheti says, examining the results from megacap tech earnings this week. It's becoming a stock picker's market as investors need to sort through the AI trade carefully; he likes hardware and inference-related companies, but is beginning to hedge. Another place he's looking is at fiber, as it's needed to move data across the country. He's also long in the defense sector in companies where he sees secular growth. ======== Schwab Network ========Empowering every investor and trader, every market day.Options involve risks and are not suitable for all investors. Before trading, read the Options Disclosure Document. http://bit.ly/2v9tH6DSubscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribeDownload the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watchWatch on Vizio - https://www.vizio.com/en/watchfreeplus-exploreWatch on DistroTV - https://www.distro.tv/live/schwab-network/Follow us on X – https://twitter.com/schwabnetworkFollow us on Facebook – https://www.facebook.com/schwabnetworkFollow us on LinkedIn - https://www.linkedin.com/company/schwab-network/About Schwab Network - https://schwabnetwork.com/about

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Brex Acquired for $5.15BN | a16z Companies are 2/3 AI Revenues | Anthropic Inference Costs Skyrocket | OpenEvidence Raises at $12BN Valuation | The IPO Market: EquipmentShare, Wealthfront and Ethos Insurance

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Jan 29, 2026 75:55


AGENDA: 03:36 Brex Acquisition by Capital One for $5.15BN 10:54 Does Brex's Acquisition Help or Hurt Ramp? 16:28 TikTok Deal Completed: Who Won & Who Lost: Analysis 19:30 Anthropic Inference Costs Higher Than Expected 37:50 Open Evidence Raises at $12BN from Thrive and DST 53:56 Wealthront IPO Disaster: Is $1.5BN IPO Too Small? 01:07:27 Salesforce Wins $5BN Army Contract: The Last Laugh for SaaS  

Data Driven
Synthetic Populations and the Future of Decision Intelligence

Data Driven

Play Episode Listen Later Jan 29, 2026 50:16 Transcription Available


In this episode of Data Driven, Frank and Andy dive into the future of market intelligence with Dr. Jill Axline, co-founder and CEO of Mavera—a company building synthetic populations that simulate real human behaviour, cognition, and emotion. Forget Personas. We're talking real-time, AI-driven behavioural modeling that's more predictive than your horoscope and considerably more data-backed.Dr. Axline shares how Mavera's swarm of AI models situates these synthetic humans within real-world business contexts to forecast decisions, measure emotional resonance, and even test marketing messages before they go live. From governance and model drift to the surprising uses in financial services, political campaigns, and speechwriting—this is one of the most forward-looking conversations we've had yet.If you've ever wanted a deeper understanding of how AI can augment decision-making—or just want to hear Frank admit asset managers love ice cream—this one's for you.LinksLearn more about Mavera:https://mavera.ioConnect with Jill Axline on LinkedIn:https://linkedin.com/in/jillaxlineMorningstar:https://www.morningstar.comTime Stamps00:00 - Introduction & AI Swarms Explained03:30 - Forget Personas: Contextual AI Models07:00 - Evidence vs Inference & AI Governance10:20 - Simulation Scenarios & Model Drift14:30 - Synthetic Audiences in Action18:00 - Evidence Feedback Loops & Small Data Challenges22:00 - Industry Applications & Use Cases27:00 - Analyzing Speeches & Emotional Resonance30:45 - Sentiment, Social Listening, and Real-Time News Reactions34:00 - Adversarial Models & Strategic Pushback38:00 - The Cartoon Bank Portal That Failed Spectacularly41:00 - From Skeptic to CEO: Jill's Journey45:00 - Data Privacy, Compliance & Synthetic Ethics48:00 - Reflections on Empathy, Engineers, and Selling Without SellingSupport the ShowIf you enjoy Data Driven, leave us a review on Apple Podcasts or your favourite pod platform. It helps more people find the show—and fuels Frank's Monster Energy habit.

The MAD Podcast with Matt Turck
State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

The MAD Podcast with Matt Turck

Play Episode Listen Later Jan 29, 2026 68:13


Sebastian Raschka joins the MAD Podcast for a deep, educational tour of what actually changed in LLMs in 2025 — and what matters heading into 2026.We start with the big architecture question: are transformers still the winning design, and what should we make of world models, small “recursive” reasoning models and text diffusion approaches? Then we get into the real story of the last 12 months: post-training and reasoning. Sebastian breaks down RLVR (reinforcement learning with verifiable rewards) and GRPO, why they pair so well, what makes them cheaper to scale than classic RLHF, and how they “unlock” reasoning already latent in base models.We also cover why “benchmaxxing” is warping evaluation, why Sebastian increasingly trusts real usage over benchmark scores, and why inference-time scaling and tool use may be the underappreciated drivers of progress. Finally, we zoom out: where moats live now (hint: private data), why more large companies may train models in-house, and why continual learning is still so hard.If you want the 2025–2026 LLM landscape explained like a masterclass — this is it.Sources:The State Of LLMs 2025: Progress, Problems, and Predictions - https://x.com/rasbt/status/2006015301717028989?s=20The Big LLM Architecture Comparison - https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparisonSebastian RaschkaWebsite - https://sebastianraschka.comBlog - https://magazine.sebastianraschka.comLinkedIn - https://www.linkedin.com/in/sebastianraschka/X/Twitter - https://x.com/rasbtFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)Blog - https://mattturck.comLinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) - Intro (01:05) - Are the days of Transformers numbered?(14:05) - World models: what they are and why people care(06:01) - Small “recursive” reasoning models (ARC, iterative refinement)(09:45) - What is a diffusion model (for text)?(13:24) - Are we seeing real architecture breakthroughs — or just polishing?(14:04) - MoE + “efficiency tweaks” that actually move the needle(17:26) - “Pre-training isn't dead… it's just boring”(18:03) - 2025's headline shift: RLVR + GRPO (post-training for reasoning)(20:58) - Why RLHF is expensive (reward model + value model)(21:43) - Why GRPO makes RLVR cheaper and more scalable(24:54) - Process Reward Models (PRMs): why grading the steps is hard(28:20) - Can RLVR expand beyond math & coding?(30:27) - Why RL feels “finicky” at scale(32:34) - The practical “tips & tricks” that make GRPO more stable(35:29) - The meta-lesson of 2025: progress = lots of small improvements(38:41) - “Benchmaxxing”: why benchmarks are getting less trustworthy(43:10) - The other big lever: inference-time scaling(47:36) - Tool use: reducing hallucinations by calling external tools(49:57) - The “private data edge” + in-house model training(55:14) - Continual learning: why it's hard (and why it's not 2026)(59:28) - How Sebastian works: reading, coding, learning “from scratch”(01:04:55) - LLM burnout + how he uses models (without replacing himself)

unSILOed with Greg LaBlanc
615. Reclaim Your Life from Digital Overload with Paul Leonardi

unSILOed with Greg LaBlanc

Play Episode Listen Later Jan 26, 2026 60:03


What are practical strategies to avoid overload and exhaustion in today's digital world? What norms can organizations create for tool usage, and how can finding offline activities that provide a mental contrast to digital work?Paul Leonardi is the Duca Family Professor of Technology Management at UC Santa Barbara, a consultant and speaker on digital transformation and the future of work, and an author of several works. His latest book is called Digital Exhaustion: Simple Rules for Reclaiming Your Life.Greg and Paul discuss the complementary nature of his two most recent books: the first focuses on harnessing digital tools, and the second on mitigating the overwhelm they can cause. They also explore teaching technology management, including the importance of understanding technology's impact on people and organizational processes. Paul explains the 30% rule, emphasizing the need to understand digital tools well enough to use them effectively. They also explore the concept of digital exhaustion, the subject of his most recent book, its symptoms, and how to manage it, both at work and in daily life. *unSILOed Podcast is produced by University FM.*Episode Quotes:How can we reduce exhaustion?41:29: One easy way of reducing our exhaustion is to match the sort of complexity of the task that we are trying to do with the affordances or the capabilities of the technology. And I say match, not over exceed, because we also have the problem where, like me, I am sure you have been in many, many meetings that should have just been an email, that there is not the need. And so what we have done in that situation is we have overstimulated people, right, in a setting with, you know, 15 other folks, and we have taken an hour out of their day and maybe the travel time to get there. And that has created other avenues for exhaustion when, if we had just perceived this information via email, we could not have had the meeting. So you do not want to overmatch, you just want to like match to the complexity of the task. And that is the key to reducing our exhaustion.It's not just distraction that exhausts us18:28: I think we have failed to look at how it is not just being distracted that is a problem, but it is the act of switching itself across all of these different inputs really is a significant source of our exhaustion.Inference is a big driver of exhaustion32:45: Inference is really a big driver of exhaustion. And I would say the place that it most shows up, although not exclusively, is in our social media lives. Because, of course, people are curating their lives in terms of what they post, whether that is LinkedIn or TikTok or Instagram, that does not really matter. And we are constantly not only making inferences of them, but what I find is that we are also very often making inferences about ourselves because we see a past record of all the things that we wrote and all of the things that we posted. And then we are also making inferences of what we think other people think about us based on all the things that we post.Show Links:Recommended Resources:Human MultitaskingTask SwitchingFatigueUnsiloed Podcast Episode 612: Rebecca HindsGuest Profile:Faculty Profile at UC Santa BarbaraPaulLeonardi.comWikipedia ProfileLinkedIn ProfileGuest Work:Amazon Author PageDigital Exhaustion: Simple Rules for Reclaiming Your LifeThe Digital Mindset: What It Really Takes to Thrive in the Age of Data, Algorithms, and AIExpertise, Communication, and OrganizingMateriality and Organizing: Social Interaction in a Technological WorldCar Crashes without Cars: Lessons About Simulation Technology and Organizational Change from Automotive DesignGoogle Scholar Page Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

In this episode, we discuss Microsoft's new Maya 200 AI inference chip, highlighting its capabilities, its importance for efficient AI model deployment, and how it signifies a major shift towards custom silicon in the AI industry. We also touch upon its potential impact on cost savings and Microsoft's strategy to become a leading player in the AI hardware space.Chapters00:00 Microsoft's Maya 200 AI Chip00:29 AI Box.ai Tools02:03 Power and Performance04:54 Inference vs. Training08:21 Efficiency and Competition14:06 Internal Deployment and Future

Crazy Wisdom
Episode #525: The Billion-Dollar Architecture Problem: Why AI's Innovation Loop is Stuck

Crazy Wisdom

Play Episode Listen Later Jan 23, 2026 53:38


In this episode of the Crazy Wisdom podcast, host Stewart Alsop welcomes Roni Burd, a data and AI executive with extensive experience at Amazon and Microsoft, for a deep dive into the evolving landscape of data management and artificial intelligence in enterprise environments. Their conversation explores the longstanding challenges organizations face with knowledge management and data architecture, from the traditional bronze-silver-gold data processing pipeline to how AI agents are revolutionizing how people interact with organizational data without needing SQL or Python expertise. Burd shares insights on the economics of AI implementation at scale, the debate between one-size-fits-all models versus specialized fine-tuned solutions, and the technical constraints that prevent companies like Apple from upgrading services like Siri to modern LLM capabilities, while discussing the future of inference optimization and the hundreds-of-millions-of-dollars cost barrier that makes architectural experimentation in AI uniquely expensive compared to other industries.Timestamps00:00 Introduction to Data and AI Challenges03:08 The Evolution of Data Management05:54 Understanding Data Quality and Metadata08:57 The Role of AI in Data Cleaning11:50 Knowledge Management in Large Organizations14:55 The Future of AI and LLMs17:59 Economics of AI Implementation29:14 The Importance of LLMs for Major Tech Companies32:00 Open Source: Opportunities and Challenges35:19 The Future of AI Inference and Hardware43:24 Optimizing Inference: The Next Frontier49:23 The Commercial Viability of AI ModelsKey Insights1. Data Architecture Evolution: The industry has evolved through bronze-silver-gold data layers, where bronze is raw data, silver is cleaned/processed data, and gold is business-ready datasets. However, this creates bottlenecks as stakeholders lose access to original data during the cleaning process, making metadata and data cataloging increasingly critical for organizations.2. AI Democratizing Data Access: LLMs are breaking down technical barriers by allowing business users to query data in plain English without needing SQL, Python, or dashboarding skills. This represents a fundamental shift from requiring intermediaries to direct stakeholder access, though the full implications remain speculative.3. Economics Drive AI Architecture Decisions: Token costs and latency requirements are major factors determining AI implementation. Companies like Meta likely need their own models because paying per-token for billions of social media interactions would be economically unfeasible, driving the need for self-hosted solutions.4. One Model Won't Rule Them All: Despite initial hopes for universal models, the reality points toward specialized models for different use cases. This is driven by economics (smaller models for simple tasks), performance requirements (millisecond response times), and industry-specific needs (medical, military terminology).5. Inference is the Commercial Battleground: The majority of commercial AI value lies in inference rather than training. Current GPUs, while specialized for graphics and matrix operations, may still be too general for optimal inference performance, creating opportunities for even more specialized hardware.6. Open Source vs Open Weights Distinction: True open source in AI means access to architecture for debugging and modification, while "open weights" enables fine-tuning and customization. This distinction is crucial for enterprise adoption, as open weights provide the flexibility companies need without starting from scratch.7. Architecture Innovation Faces Expensive Testing Loops: Unlike database optimization where query plans can be easily modified, testing new AI architectures requires expensive retraining cycles costing hundreds of millions of dollars. This creates a potential innovation bottleneck, similar to aerospace industries where testing new designs is prohibitively expensive.

a16z
Inferact: Building the Infrastructure That Runs Modern AI

a16z

Play Episode Listen Later Jan 22, 2026 43:37


Inferact is a new AI infrastructure company founded by the creators and core maintainers of vLLM. Its mission is to build a universal, open-source inference layer that makes large AI models faster, cheaper, and more reliable to run across any hardware, model architecture, or deployment environment. Together, they broke down how modern AI models are actually run in production, why “inference” has quietly become one of the hardest problems in AI infrastructure, and how the open-source project vLLM emerged to solve it. The conversation also looked at why the vLLM team started Inferact and their vision for a universal inference layer that can run any model, on any chip, efficiently.Follow Matt Bornstein on X: https://twitter.com/BornsteinMattFollow Simon Mo on X: https://twitter.com/simon_mo_Follow Woosuk Kwon on X: https://twitter.com/woosuk_kFollow vLLM on X: https://twitter.com/vllm_project Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Cloud Security Podcast
Why AI Can't Replace Detection Engineers: Build vs. Buy & The Future of SOC

Cloud Security Podcast

Play Episode Listen Later Jan 21, 2026 52:08


Is the AI SOC a reality, or just vendor hype? In this episode, Antoinette Stevens (Principal Security Engineer at Ramp) joins Ashish to dissect the true state of AI in detection engineering.Antoinette shares her experience building detection program from scratch, explaining why she doesn't trust AI to close alerts due to hallucinations and faulty logic . We explore the "engineering-led" approach to detection, moving beyond simple hunting to building rigorous testing suites for detection-as-code .We discuss the shrinking entry-level job market for security roles , why software engineering skills are becoming non-negotiable , and the critical importance of treating AI as a "force multiplier, not your brain".Guest Socials - ⁠⁠⁠Antoinette's LinkedinPodcast Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠@CloudSecPod⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels:-⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Cloud Security Podcast- Youtube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Cloud Security Newsletter ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠If you are interested in AI Security, you can check out our sister podcast -⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AI Security Podcast⁠Questions asked:(00:00) Introduction(02:25) Who is Antoinette Stevens?(04:10) What is an "Engineering-Led" Approach to Detection? (06:00) Moving from Hunting to Automated Testing Suites (09:30) Build vs. Buy: Is AI Making it Easier to Build Your Own Tools? (11:30) Using AI for Documentation & Playbook Updates (14:30) Why Software Engineers Still Need to Learn Detection Domain Knowledge (17:50) The Problem with AI SOC: Why ChatGPT Lies During Triage (23:30) Defining AI Concepts: Memory, Evals, and Inference (26:30) Multi-Agent Architectures: Using Specialized "Persona" Agents (28:40) Advice for Building a Detection Program in 2025 (Back to Basics) (33:00) Measuring Success: Noise Reduction vs. False Positive Rates (36:30) Building an Alerting Data Lake for Metrics (40:00) The Disappearing Entry-Level Security Job & Career Advice (44:20) Why Junior Roles are Becoming "Personality Hires" (48:20) Fun Questions: Wine Certification, Side Quests, and Georgian Food

Crazy Wisdom
Episode #524: The 500-Year Prophecy: Why Buddhism and AI Are Colliding Right Now

Crazy Wisdom

Play Episode Listen Later Jan 19, 2026 60:49


In this episode of the Crazy Wisdom podcast, host Stewart Alsop sits down with Kelvin Lwin for their second conversation exploring the fascinating intersection of AI and Buddhist cosmology. Lwin brings his unique perspective as both a technologist with deep Silicon Valley experience and a serious meditation practitioner who's spent decades studying Buddhist philosophy. Together, they examine how AI development fits into ancient spiritual prophecies, discuss the dangerous allure of LLMs as potentially "asura weapons" that can mislead users, and explore verification methods for enlightenment claims in our modern digital age. The conversation ranges from technical discussions about the need for better AI compilers and world models to profound questions about humanity's role in what Lwin sees as an inevitable technological crucible that will determine our collective spiritual evolution. For more information about Kelvin's work on attention training and AI, visit his website at alin.ai. You can also join Kelvin for live meditation sessions twice daily on Clubhouse at clubhouse.com/house/neowise.Timestamps00:00 Exploring AI and Spirituality05:56 The Quest for Enlightenment Verification11:58 AI's Impact on Spirituality and Reality17:51 The 500-Year Prophecy of Buddhism23:36 The Future of AI and Business Innovation32:15 Exploring Language and Communication34:54 Programming Languages and Human Interaction36:23 AI and the Crucible of Change39:20 World Models and Physical AI41:27 The Role of Ontologies in AI44:25 The Asura and Deva: A Battle for Supremacy48:15 The Future of Humanity and AI51:08 Persuasion and the Power of LLMs55:29 Navigating the New Age of TechnologyKey Insights1. The Rarity of Polymath AI-Spirituality Perspectives: Kelvin argues that very few people are approaching AI through spiritual frameworks because it requires being a polymath with deep knowledge across multiple domains. Most people specialize in one field, and combining AI expertise with Buddhist cosmology requires significant time, resources, and academic background that few possess.2. Traditional Enlightenment Verification vs. Modern Claims: There are established methods for verifying enlightenment claims in Buddhist traditions, including adherence to the five precepts and overcoming hell rebirth through karmic resolution. Many modern Western practitioners claiming enlightenment fail these traditional tests, often changing the criteria when they can't meet the original requirements.3. The 500-Year Buddhist Prophecy and Current Timing: We are approximately 60 years into a prophesied 500-year period where enlightenment becomes possible again. This "startup phase of Buddhism revival" coincides with technological developments like the internet and AI, which are seen as integral to this spiritual renaissance rather than obstacles to it.4. LLMs as UI Solution, Not Reasoning Engine: While LLMs have solved the user interface problem of capturing human intent, they fundamentally cannot reason or make decisions due to their token-based architecture. The technology works well enough to create illusion of capability, leading people down an asymptotic path away from true solutions.5. The Need for New Programming Paradigms: Current AI development caters too much to human cognitive limitations through familiar programming structures. True advancement requires moving beyond human-readable code toward agent-generated languages that prioritize efficiency over human comprehension, similar to how compilers already translate high-level code.6. AI as Asura Weapon in Spiritual Warfare: From Buddhist cosmological perspective, AI represents an asura (demon-realm) tool that appears helpful but is fundamentally wasteful and disruptive to human consciousness. Humanity exists as the battleground between divine and demonic forces, with AI serving as a weapon that both sides employ in this cosmic conflict.7. 2029 as Critical Convergence Point: Multiple technological and spiritual trends point toward 2029 as when various systems will reach breaking points, forcing humanity to either transcend current limitations or be consumed by them. This timing aligns with both technological development curves and spiritual prophecies about transformation periods.

Latin in Layman’s - A Rhetoric Revolution
The Inference Limit - Speed of Thought (A Short Sci-Fi story from me)

Latin in Layman’s - A Rhetoric Revolution

Play Episode Listen Later Jan 17, 2026 28:27


My links:My Ko-fi: https://ko-fi.com/rhetoricrevolutionSend me a voice message!: https://podcasters.spotify.com/pod/show/liam-connerlyTikTok: ⁠https://www.tiktok.com/@mrconnerly?is_from_webapp=1&sender_device=pc⁠Email: ⁠rhetoricrevolution@gmail.com⁠Instagram: https://www.instagram.com/connerlyliam/Podcast | Latin in Layman's - A Rhetoric Revolution https://open.spotify.com/show/0EjiYFx1K4lwfykjf5jApM?si=b871da6367d74d92YouTube: https://www.youtube.com/@MrConnerly 

WSJ Tech News Briefing
TNB Tech Minute: Nvidia Licenses Groq's AI-Inference Technology

WSJ Tech News Briefing

Play Episode Listen Later Dec 26, 2025 2:09


Plus: China sanctions U.S. defense companies and executives including Northrop Grumman, Boeing and Palmer Luckey over Taiwan arms sale. And Google will let users change their Gmail address. Julie Chang hosts. Learn more about your ad choices. Visit megaphone.fm/adchoices