Podcasts about Cuda

  • 480PODCASTS
  • 978EPISODES
  • 48mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 9, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about Cuda

Latest podcast episodes about Cuda

De Nederlandse Kubernetes Podcast
#136: vLLM, LMD, and the Quest to Build the Linux of AI Inference

De Nederlandse Kubernetes Podcast

Play Episode Listen Later Jun 9, 2026 32:21


In this episode, hosts Ronald and Jan are joined at KubeCon by two guests from Red Hat: Brian Stevens, AI CTO and one of the original architects behind the creation of Kubernetes and the CNCF, and Rob Shaw, co-lead of the vLLM project and maintainer of LMD.Brian shares the remarkable backstory of how Kubernetes came to be open source, including how Red Hat negotiated a single committer seat before agreeing to be a launch partner, and how he later pushed Google to contribute Kubernetes to the newly formed CNCF rather than keeping it proprietary like TensorFlow.Rob explains what an inference runtime actually is: the critical piece of software that takes an abstract AI model and runs it as efficiently as possible on a GPU or other accelerator — handling everything from CUDA-level kernel optimization to memory management and concurrent request scheduling. vLLM serves as a "Rosetta Stone" between the ever-growing zoo of models (Llama, DeepSeek, Mistral, Qwen, Nvidia Nemotron) and accelerators (Nvidia, AMD, Intel, Google TPUs).The conversation covers model compression and quantization how techniques like 4-bit precision can deliver 2x hardware efficiency gains while preserving 99%+ model accuracy. Brian and Rob also address the "big model vs. many small models" debate, recommending to always start with the largest capable model to validate a use case before optimizing down.Looking ahead, both guests see inference as potentially the single largest workload ever run on Kubernetes, and position LMD (now contributed to the CNCF) as the distributed inference layer that will make this possible across heterogeneous accelerator environments  preventing enterprises from ending up with 42 incompatible AI stacks.The episode closes with a discussion on AI slop, human-in-the-loop thinking, and the future of Kubernetes as the universal platform for running AI agents at scale.Powered by  @acc-ict ​Stuur ons een bericht.ACC ICT Specialist in IT-CONTINUÏTEIT Bedrijfskritische applicaties én data veilig beschikbaar, onafhankelijk van derden, altijd en overalSupport the showLike and subscribe! It helps out a lot.You can also find us on:De Nederlandse Kubernetes Podcast - YouTubeNederlandse Kubernetes Podcast (@k8spodcast.nl) | TikTokDe Nederlandse Kubernetes PodcastWhere can you meet us:EventsThis Podcast is powered by:ACC ICT - IT-Continuïteit voor Bedrijfskritische Applicaties | ACC ICT

東森美洲關鍵時刻 ETTV AMERICA
廣達林百里史詩級豪賭造就輝達「最強武器CUDA」!黃仁勳一開金口引爆兄弟股價飆漲停?!《寶傑點兵》20260605

東森美洲關鍵時刻 ETTV AMERICA

Play Episode Listen Later Jun 5, 2026 13:24


- 姚惠珍 黃暐瀚 劉寶傑

Sharks Hockey Digest
Cuda Confidential: Brendan Hoffmann

Sharks Hockey Digest

Play Episode Listen Later Jun 4, 2026 40:00


#SJBarracuda voice Nick Nollenberger catches up with forward Brendan Hoffmann to talk about growing up in Charlotte, his path to the OHL, nearly retiring from hockey after junior, and how a breakout 2025-26 season in the ECHL helped pave the way to San Jose.

Cuda Confidential
Cuda Confidential: Brendan Hoffmann

Cuda Confidential

Play Episode Listen Later Jun 4, 2026 40:00


#SJBarracuda voice Nick Nollenberger catches up with forward Brendan Hoffmann to talk about growing up in Charlotte, his path to the OHL, nearly retiring from hockey after junior, and how a breakout 2025-26 season in the ECHL helped pave the way to San Jose.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 3, 2026 38:58


We've informally heard that Satya is a listener to LS for a couple years now, but it was still absolutely surreal to meet him and do a live pod at Build, together with our friends at No Priors, the leading VC AI Podcast that we also greatly admire!We covered the MAI model technical takeaways on yesterday's AINews, so I will focus our recap of Satya's main messages around three elements:* Satya's adaptation of the Bill Gates Line for positioning Microsoft as the Frontier Intelligence Platform — customers must gain much more value from the Microsoft ecosystem than Microsoft itself, by building on multi-model harnesses like OpenClaw and Scout, drawing on the full enterprise context exposed by context layers like Work IQ (heavily dogfooded by his C-suite), and building up private evals and traces as a new form of Token IP* AI ROI: On one hand, enterprises are having difficult conversations around Tokenmaxxing and Layoffs, and on the other hand, there are serious re-evaluations of the End of SaaS since the Build vs Buy equation has changed so much. Our previous SemiAnalysis guest had… interesting comments on Microsoft's position on this as the ur-SaaS titan, and Satya had great answers* Making the Impossible Possible: Kevin Scott's inspiring framing around what the most ambitious version of applying AI and technology at large to business and social problems, like education and social impact.Enjoy!Full VideoTranscriptVoiceover: Welcome swyx, Sarah Guo, Elad Gil,, and Chairman and Chief Executive Officer of Microsoft, Satya NadellaSarah Guo: Welcome to a crossover episode of No Priors and Lane Space with Satya Nadella. Um, congratulations on an amazing build. No, thank you so much, and it's great to be with both of you. I listen to both of you or b- both the podcasts all the time. It's great to be on it.Thank you so much. [00:01:00] So you're just talking about, um, these amazing, uh, announcements from across the Microsoft estate all morning for, I think, three hours. What is the, uh, what's the most important reflection or takeaway you have?AI as an Ecosystem PlatformSarah Guo: I, I'd say there are, uh, perhaps the, the biggest one for me is let's sort of conceptualize this more as an ecosystem play as opposed to a single model or even a single platform, right?Satya Nadella: I mean, you know, whatever I... At least for me, having grown up at Microsoft, having seen, whatever, four major platform shifts, uh, I sort of fall into that, um, uh, camp where a platform is defined by fundamentally its ability to create more value about the platform versus what's captured in the platform. And so if you, you view what's happening right now, I think this morning's keynote was how can any company, whether it's an AI native company or a traditional enterprise company, participate as a first-class participant where they can point to AI they created, [00:02:00] right?It's not that they don't use other people's AI. Of course they will. But to me, what's the path? What's the recipe? How do I do it? What does a stack look like? What does the tooling look like? What is valuable? How do you do that? That's it. That's sort of our job to do. Yeah. Ecosystem strategy is, uh, very complicated, right?Sarah Guo: Because you end up building certain components, partnering for certain components, supporting them. You just announced this big suite of models. Like, tell us a little bit about the, uh, training strategy for Microsoft now. Yeah.MAI Models & Training StrategySarah Guo: So, so the thing that we wanted to do with the MAI models was to build, and as Mustafa talked about, first of all, a great lineage, right?Satya Nadella: Starting with pre-training, uh, with very good data quality, uh, doing all the ablations, making sure because in, in some sense it's becoming even harder to build a clean lineage model just because there's so much stuff out there, uh, that you truly need to ablate out to be able to have a fantastic [00:03:00] pre-trained model.In fact, that's one of the challenges of a lot of the open weight models is they look great on one benchmark or two, but they're not great on practice. So that's why, in fact, even in the RFDEs are, they, they are pretty gone really excited about these MAI models because how the heck can a small five B model hill climb?Uh, and it goes back a little bit to what I think is ultimately the key thing to do, which is try to pursue finding that cognitive core. Uh, so to me, starting with a clean lineage- Then creating that ability for companies to be able to use this, right? Not just as a generalist, but to create their own specialist by building this hill climbing scaffold around it, right?So it's not just the model, but you have a hill climb scaffold around it, then you will start building your RLE. You will start collecting the traces. Most importantly, you'll have private evals because we know all the evals out there are good, interesting, [00:04:00] but they're not really that critical- They're work, yeahSwyx: at this point because they all can be maxed. And so the point is each company will have its own private eval. And so that end-to-end platform story around our models is sort of, uh, what I think is interesting. And then the one other thing, Sarah, since you brought that up, is I do feel there's a new frontier.Satya Nadella: Like people talk about the frontier and are you operating at the frontier. Um, interestingly enough, if you add a little temporality to it, you can use, let's say, in, in, in fact, the, the Lando Lakes demo we showed was pretty cool. We used, whatever, GPT-55, right? Then you collected a bunch of traces, and then you took a 5B reasoning model and achieved higher.Sarah Guo: Uh, so that is another aspect of what it means to appear... uh, you know, operate at the frontier Yeah. I, I think, uh, I first of all have to congratulate you on basically building a frontier neo lab inside of Microsoft in two years. Um, I'm wondering, you know, you have all this AI strategy that you're rolling out.Lessons from Two Years of AI DevelopmentSwyx: I'm wondering, what do you know now that you wish you would tell yourself two years ago where- or two or [00:05:00] three years ago? Three years for the Jensen partnership, two years for, uh, MEI. Yeah, I mean, I think the, the thing when, that I reflect quite a bit, right, which is sort of obviously I got into all this when I got excited by the, the scaling laws paper and, you know, when, you know, even the OpenAI partnership came about when those folks said, “Hey, we're gonna really throw a lot of computer transformers.”Satya Nadella: Uh, and they've helped. I- the thing that I always look back and say, “Wow, these things, uh, do have capability that they're climbing up.” W- I mean, this, you know, this crude way of saying it is intelligence is log of compute kind of works. Now what I think we underestimated perhaps is the real-world complexity of deploying these so that they actually deliver the value in the real world, right?So the outcomes as measured by any benchmark is interestingly important, but the true eval is when people out there are able to do unique things that they only can value, and it's very [00:06:00] measurable, right? That I wish we had sort of even, like, had more in our consciousness, right? Which is as an industry.Sarah Guo: Because right now I think when people say, “Wow, I don't want a token max,” it's an artifact of us not having thought ourselves as an industry that we are using tokens to create value every step of the way. So I think that's kind of what I wish we had gotten there, but I'm glad we are here.Real-World Value & Use CasesSarah Guo: What are some of the use cases that you've seen that have created the most value for your customers?Because I know that people talk a lot about code, and I think it's pretty clear that that's something that's having very large scale impact. Are there other areas that you find in common that your customers are really benefiting from? Yeah. I think, yeah, to your point, obviously coding is now got... But it's interesting, by the way, Elijah, to even talk about the coding, right?Satya Nadella: Which is coding has worked so well that we now have to rebuild the IDE, right? I mean, it's kind of nuts to see what we sh- launched is like, oh my God, I have these hundred agent sessions. I... The cognitive load it transfers back to me as a human is so [00:07:00] excessive that now I need a new UI. Uh, oh, by the way, I, like the, the chat as the only artifact was also impossible, so that's why we need a canvas.So it's kind of interesting for all the things about where is software needed or where is UI needed, uh, you kind of need that even for code, right? In a fully agentic world. But that said, one of the things that we are starting to see, we started seeing with co-work, but even some of the work we, we showed with auto com- uh, um, autopilot Right on what you see with claws is a good one because if you sort of think about a lot of human capital is doing the glue work, right?If you now can augment that with tokens/agents that are long-running, durable, right, then your ability to scale even what is still judgment and glue work gets amplified like coding does. Uh, so you can... Like, I'm positive that six months from now we'll all be saying, “Oh, wow,” like, all through ni- the night there was a bunch of stuff that [00:08:00] all these autopilots that I have working on my behalf with my delegated authority, so to speak, right?I can... Sort of given even my identity, did a bunch of work, then of course I'll need my new ADE to say, “Well, what did you do?” Like, I might... “Did I do this work?” And so on. So I think that that's where compressing of workflows, uh, completing of tasks, uh, that's where I think a lot of the value gets created. I think you raised a really interesting point, which is there's the actual agent that's doing the code, and then there's a harness around it, and that's the environment, that's the context, that's everything you're setting up as a developer around actually a coding agent.The Harness Concept for Enterprise AISarah Guo: What is the harness for the enterprise? Is there an equivalent concept for broader productivity work, or how do you think about that concept sort of generalized? That's right. So, so in some sense you kind of want the harness to define the models, the, the data, uh, and the tools, and so that you have a loop across those three.Satya Nadella: And so what we are trying to, first of all, make sure is each of our products that we build, right, whether it's GitHub Copilot or the security copi- the, the [00:09:00] stuff we showed with MDASH or even the discovery for science, it doesn't matter, all of them are multi-model harnesses, um, with tools access so that you can do this progressive, uh, disclosure of tools even so that they're token efficient.Uh, and then you're feeding it with very rich context because that's sort of the other hard lesson we have learned in the last two years is, oh my God, the amount of work you need to do to prep the context layer, uh, such that your plan can execute in the most efficient way is where the magic is. So we have, in our case, we have the GitHub harness, which essentially we're using across all our products.It's available in Foundry, and we are open, like you can use your Llama harness, whatever. Or you can use the, um, uh, you know, any open harness or any harness of yours and train with your tools and multiple models and your context. And so that's the pitch. Because right now a lot of dialogue is, um, “Hey, if I train the harness plus tools and the model together, you get [00:10:00] evals.”Elad Gil: And what we are proving out is... And the best example of that is what we did with MDASH, right? Because when it launched, uh, it found bugs or vulnerabilities that were not found by Mythos Uh, and so there is existence proof, I would claim, that you can have a multimodal harness, uh, that can in fact be more, uh, performant in the real world So a premise behind the, uh, training at the independent frontier labs is really, you know, we're gonna have these models, and we'll have an API business, and we'll support enterprises and startups.Sarah Guo: ButPlatform Strategy & Developer EcosystemSarah Guo: a first-party product, be it productivity or code or search, drives the majority of revenue. That's a different value equation than you're describing, I think, with the Microsoft ecosystem. Uh, if, if that's the case, tell me if it's the case, uh, ‘cause obviously you have first-party products and you have enablement products.Satya Nadella: Um, what is the role of the develop- Like what is gonna be hard and the set of skills and the value capture the developer has in that world? Yeah. So I think that there's always [00:11:00] gonna be the case that someone who is super successful in- as a platform builder can also have first-party products. It was true with Windows.It is true, uh, with, uh, the, the SaaS side and the cloud side as well with us and others and so on. But the thing that is, is it should not be a limiter to other people achieving that same success, right? That I think is the core difference, which is the, the network effects this time around, around intelligence are such because they learn from data, and not really lots of data.It's just a few samples that you have to see to understand what's novel about something. So that's why the game becomes how to protect. So that's why I would say every company, having private evals may be the biggest IP, right? Think about it, like what's that private eval that you can then use even a frontier model to hill climb on and not leak the traces may be one of the biggest [00:12:00] drivers, uh, of IP.Like, so in other words, another te- acid test is you have an eval that's private. You're using, uh, a g- a Model A. Can you switch it to Model B and e- you know, climb up? If you can, then you're in control. If you can't, you're not in control, and that's where even the harness decision becomes super important, right?swyx So therefore, having an open harness, letting all models come in, having your evals, your context, your tools help you hill climb, I think is the skills that an AI native startup needs, a SaaS company needs, or every enterprise needs. Yeah, I think in, in a very real way you are ... Microsoft historically is an operating systems company and th- then become a cloud company.Maybe like the third act is that you're a harness or evals company. Whatever w- ... whatever the, the sort of conglomerate of concepts that you wanna put together. Um, and, and I think like enabling every company to have like frontier intelligence or what- what- Yeah ... I forget the, the [00:13:00] exact term that you used, um, is the, is the mission, right?Satya Nadella: That's it. Like that is, that is the platform promise, that you build with us, you will get your intelligence, uh, for your data. That's it. That ... To, to me, that is the ... Like if there was one tagline, uh, for this entire developer conference is- Can everybody operate at the frontier with their frontier intelligence, right?To me, that is so important because otherwise it, I, I don't know how you achieve stable equilibrium, right? Which is how do I then go and say, “Well, my company is gonna have a terminal value because I now know how to continuously compound-” Yeah ... on top of what's a platform that gets better,” right? So when, like Windows obviously came out, Adobe built, Autodesk built, uh, or even like take what Jensen said.We built DX and he built, you know, CUDA on top of it. Um, right? I mean, I always say to Jensen, “God, I got the short end of that,” right? “I wish, uh, we had recognized it.” But nevertheless, but that, that idea that you can build a platform layer [00:14:00] that someone else can then extend out, um, and build their own intelligence layer in this case, I think is everything, right?Without it, why have a developer conference? I can just come and have you all sort of just worship at the altar of one model. Yeah. But that's not a developer conference. Uh,IP, Evals & Company Valueswyx: backstage we, we had a discussion about what is IP or what is the, the value in a company. It used to be the length of, uh, human experience at a company, and now it's this other thing which is the evals, the, uh, experience in sort of applying agents to the company. Can you... I just want you to like flesh that out a bit more ‘cause- Yeah ... it was very insightful.Satya Nadella: It's a great way to frame it, right? Because yeah, at the end of the day, every company is gonna have both the human capital that is still gonna be super valuable, uh, because humans, uh, and their ability to find the gaps that exist at all times is going to be the way we all will create value, right?I mean, so I'm definitely in the camp that this is going to be about expressing new forms of human agency and ambition even as token capital goes up, right? So let's say a cor- any corporation [00:15:00] has lots of tokens and lot of human capital. The question is how do you compound the two? So if you have a... Like if you take in Teams I have a bunch of agents doing work and a bunch of humans doing work, and the traces between those, that is really important context of how that enterprise is creating value.Then that goes back to train not a generalist model, but to train the company veteran agent, uh, right? That is super valuable again, right? Which is when a company goes says, “It should in fact go onto the balance sheet,” is how I think about it, right? That's so... In fact, there may be... Like human capital was never possible to go put on a balance sheet, uh, because you didn't know how to capture the tacit knowledge.swyx: Whereas now I think you can with the agents that have learned through the h- through, through time, through all the traces. Uh, so that's what at least we think will happen. I, I think the SEC is gonna have to have accounting standards- ... for token, uh, expertise Uh, y- y- you're talking about the equilibrium [00:16:00] state, um, and a stable equilibrium where companies have this compounding value and can see terminal value for themselves.Future of SaaS & Business ModelsSarah Guo: Another challenge to, you know, the considered equilibrium of, okay, there are applications and workflows that are sort of common to a vertical or a horizontal. Um, and this was, like, the generation of SaaS companies and, you know, Microsoft has lots of SaaS properties as well. And then there are things that are very specific to every enterprise that they're differentiated against.Elad Gil: Um, I'm sure you have heard much and participate in much of the debate about the end of software because all these workflows are, are cheap to generate now. Um, do you think the equilibrium looks different between what agents get built- Yeah ... in enterprises versus in their vendors in the future? Yeah. So I think what's happening there is, see, we, we had a particular way we captured, um, I would say workflow in apps, right?Satya Nadella: Because we built a, a data model, right? We schematized some part of some business process. Mm-hmm. We then built a bunch of business logic. Yep. And then we put a bunch of UI [00:17:00] on top of it, right? So that's kind of what every SaaS company- And a little configuration. For, like, 20, 20 years that was the plan.Right, that- Yeah ... and that was it. So interestingly enough, now you kind of get to re-litigate that vertical stacking, right? So I still think, for example, that data model that you built underneath every SaaS application is super good, right? Like, why reinvent it? Like, I, I, my general ledger better be a general ledger.I don't need new schema creation. No. Uh, in fact, that entity relationship, uh, is actually pretty good, robust thing that I want to feed. And you want it to be stable. That's right. Yeah. Then same thing with business logic, right? If, if you look at, uh... We have this product called Power BI, right? It is like dashboards galore people created.The beauty underneath that dashboard is a very rich semantic model, right? Someone took the pain to create a dashboard and do all the measures, and you want that. That's business logic, right? I want that to be available to me. So I think the [00:18:00] challenge of the SaaS business model is we packaged one way. We now have to learn how to unbundle these things and rebundle in new ways and discover new business models, right?I mean, if you look at it, d- what's happening today with Microsoft 365 is a great example, right? We have this thing called Work IQ. In fact, like, what we are realizing is, oh my God, like, you know, if you look at... In fact, there's a pa- historical parallel too, right? We sold first Exchange and SharePoint and, uh, you know, before Teams, we had a thing called Lync Server and what have you, and we thought, “Oh, that's all gonna move to the cloud.”But little did we realize that, um, the number of people who will use servers in the cloud is 10X, 100X, right? Because people were not buying servers, they were just buying a subscription. Mm-hmm. The same thing is now happening with M365 because with Work IQ, we have exposed what is perhaps the most important database in a company that never got used as a database because it was only captive to our apps.Mm-hmm. Right? It, it was all email operated on it, Teams operated [00:19:00] on it, Word, Excel, PowerPoint, SharePoint. But now, like this is one of the coo- coolest things I get to do with Work IQ. I go to a GitHub repo and I say, “Hey, I attended a bunch of design meetings last week related to this repo. Can you capture all that and tell me what changes I should make?”I mean, think about that, right? It literally can go look at all those transcripts, come back with a plan to change a code base, right? Previously, you could never have thought of using M365 for something like that. So the value creation opportunity now in the agent world is in fact 10X more, but it does require us to have...Sarah Guo: For example, there's going to be usage around M365, right? Which is going to be perhaps more than even the e- end users and we have to even re-architect. Like, in fact, like what I use to serve an inbox or a mailbox cannot be used to serve an agent. Uh, and so that's sort of what we are doing.Pricing Models: Per-User, Consumption & OutcomesSarah Guo: I don't believe in, like, permanent business models for any of these domains, but in the [00:20:00] near term, do you have a prediction between, uh, you know, outcomes-based pricing, token-based pricing?Elad Gil: Enterprise bundles Yeah. The way I- I think about this is always we've had... Like, let's even take the per-user pricing. Mm-hmm. The per-user pricing is really an artifact of someone creating a budget needing certainty, right? Because it's the most important thing. Like, somebody wants a budget- Mm-hmm ... they need a per user.Satya Nadella: And, and per user is just a set of entitlements to usage, right? That's kind of what it is. And so the way is, if the first bundling will be take some usage, bundle it into per user stacks and, you know, then sell subscriptions. So subscriptions I think are gonna be there, per user is gonna be there. Then the next big thing will be consumption.So people will say, “I want consumption.” And it's also possible that people will say, “I don't even want to pay for any of the subscriptions or the consumption's outcome.” Mm. But remember, most people love outcomes until they have an outcome, because once you have an outcome, it's like giving away royalty, [00:21:00] right?Mm. I mean, like I, I've talked to customers who love, you know, outcome-based pricing, and I say, “I'm all in,” until they, “Oh my God,” like, “what are you talking about? You're sharing in my outcome? No, no, no. I want you to go back to per-user pricing, and I want you to consumption price,” right? So I think that debate will go on.Uh, but and all, all, all of these business models have a particular time and a place versus one to rule them all. And if anything, if you're a SaaS vendor or you're a platform vendor, having that flexibility... And quite frankly, we face this with GitHub, right? We just recently announced a per-user pricing on GitHub because little, you know, we- GitHub Copilot was constructed at a per-user level before we understood even, uh, the intensity of usage of agents, right?It was an interactive way for a developer to use code complete, maybe tasks. It was not like, oh, I launched 10,000, you know, agents that are going on all day, right? So that is what the adjustment is about. So now that we really want, there will [00:22:00] always be a per user, but there will have to be a consumption meter.Durability of SaaS & Build vs BuySarah Guo: How do you think about the durability of SaaS more generally? One thing I've observed is in a lot of enterprises internally, there will be teams that almost have agent euphoria. They're so excited about the explosion of things they can build that they're trying to rebuild a lot of applications or going to their SaaS vendors and saying, “We're not gonna work with you anymore,” or, “We're considering an internal project.”And it seems like in six to nine months, maybe some of those people will come back and say, “Actually, we, we can't rebuild everything.” How do you think about what's durable in this world and what isn't? Yeah, it's a... It... I think we have to go through one full budget cycle on this to really see the, um- Uh, the sort of the emergence of the equilibrium, because at the end of the day, there's marginal cost to even generating the app, right?Elad Gil: In, in fact, there can be even a, a simple way to say it, like if you should always acquire something if the marginal cost of building and maintaining, uh, something on your own is higher. Uh, right? That should be like it's a quantifiable- Yeah. Right? A quantifiable thing. And [00:23:00] the maintenance part is important, right?Even, like you got to remember like, hey, you know, all the security stuff that now AI will find, you better fix them too fast. Uh, of course, there's a coding agent to help you with, but then that burns tokens, right? So whose responsibility is it? It's kind of like a, a cycle that you've got to think through.And I think we have gone through the excitement that I can generate a lot of software. I think the next thing would be what software do I really want to generate? Mm-hmm. What software do I want to use from others? How do I compose these two into some agentic workflow that I have agency over, right?Sarah Guo: Because I think there'll be very little tolerance for anybody who's inflexible, uh, at the vendor level. Uh, but at the same time, I think that anyone who has got that flexibility shows up, delivers the value, will be back at again, right? We're selling software, uh, but with just different business models, in fact Uh, speaking about building software, um, one of my favorite moments from, I think, a previous build maybe one or two years ago was they had a b- they, they...Swyx: There was a section of you building your [00:24:00] own software. I'm curious if you're building anything now. Yeah. So I, I think the... You know, first of all, let's face it, right? Building software has made it possible for even the incompetence of a CEO of a company- ... like ours, uh, you can build, so thank God. But that said, I, I, I, I do feel that, you know, something like, um, GitHub Copilot to me, and especially the new Sessions app or the new app, has just made it so much more possible for you to have agency over artifacts that you felt you couldn't touch before, right?Satya Nadella: So to, for me as a CEO, even to go to a code base, uh, to be able to learn about it, like I remember joining Microsoft long back, you know, first and then you say, man, everybody had to go in and look at, you know, whatever, Cutler's, Malik, or what have you to learn how to do good C, uh, C++ code. Um, so now that ability to be more full stack up and down is so good, but that doesn't mean every one of us should be doing the same thing.The question is: [00:25:00] how do you then have the ability to inspect things, learn things, see things, um, I think is just so much more. And so to me, what I'm building a lot of is these long-running Foundry agents. Uh, right? So there's autopilots. So the easiest thing is, to me, I think I just built one, uh, even last week, where the idea was, hey, can I have an agent that is continuously monitoring essentially my own chief of staff autopilot, right?We're gonna have that obviously in, uh, Scout. That's what, uh, uh, we showed. But it is so easy and trivial to build. I took Work IQ. I said, “Take Work IQ, go, uh, and build a Foundry long-running agent.” Uh, store all the memory in, um, uh, using Ray Fin, right? Basically at my backend as a service. And lo and behold, it built it, and not only built it, I could say publish to Teams, and it published the damn thing to Teams.Sarah Guo: So the ability, uh, to have a, you know, some end-to-end project like this complete is just pretty [00:26:00] miraculous. How do you think, uh,Future Engineering RolesSarah Guo: that impacts the different types of engineering roles that exist in the future? Because right now I think there's, you know, a dozen different types of engineers that you can be, from QA, front end, et cetera.You know, there's a big swath. I've heard some people argue that in four or five years we'll basically end up with four engineering roles. It'll be people who are managing agents, it'll be four deployed engineers or FDEs, it'll be security engineers, and then people working on large scale infrastructure for a small number of services, and then everything else just collapses into the agentic world.Satya Nadella: Yeah, I- Do you think that's a correct view of the world? Yeah, I mean, I think, I think we'll have to experiment our way through it. But what you said is what... There are some very at scale things. At LinkedIn, they did structurally change- Mm-hmm ... uh, and it, you know, basically built up a new discipline called full stack builder, right?So they went and said, “Hey, let's bring, uh, people from design and product management, front end engineering, all put them together.” Uh, but also have an edge, right? It's not like the design person still doesn't have the design edge, or the front end [00:27:00] person doesn't have the front end edge, but you can give yourself bigger scope in roles so that you're not confined to one role.Um, and then r- equally, infrastructure has become very critical, right? So in other words, like, I mean, RLEs, I mean, one thing we've realized is even for the Excel team, for example. Mm-hmm. Building the RLE in which a reward can be learned is actually one of the hardest sort of infrastructure problems.Mm-hmm. Uh, and so you kind of need even new talent, right? Distributed systems people even in what was considered an end user app team, uh, because it's a different skill set. So yes, infrastructure, science is the other one, obviously. Um, so I think we'll see how these evolve, right? Where's the s- real... I mean, always the world will have a bunch of specialists.Okay. Um, you know, I think the generalist role is going to be the most exciting, right? Because the leverage of a generalist- Mm-hmm ... um, is where we are going to see the maximum returns, right? When, when you said, “Hey, are you coding?” I'm now a gen- Like, what... I've basically translated [00:28:00] knowledge work Right?Which I did, where I created a Word document or a spreadsheet, or even, uh... And now I can build an app, right? It's in the same sentence. Uh, right? That idea that, “Oh, wow, my generalist skills have gotten higher leverage,” I think is what we're gonna see across the board. Music to the ears of CEOs and VCs that are, like, a little dangerous and a lot of- Golden age for idea peopleSarah Guo: idea people. Yeah. Uh- With a lot of agency. I- if you take that idea of personal agency and you just zoom it out to the organizational context, um, uh, my partner Mike Renall, who, uh, actually started his career at Microsoft, just wrote an essay where one of the big takeaways is i- it's an age where you can be much more ambitious, and you need to be, given the pace of the environment and how quickly, actually, users and companies are open to adopting new technologies.Satya Nadella: Um, how do you think about... I, I feel silly asking this of somebody running a, you know, trillion-dollar-plus company already, butAmbition & Making the Impossible PossibleSatya Nadella: how do you think about how Microsoft can be more ambitious now? It's a great question. Um, I [00:29:00] think, um- I think the, the thing in these type of transitions is to have a conceptual model of how work can change to go after outcomes that you could hardly imagine previously, right?In fact, Kevin Scott has this nice line, right, which is, um, when you can make the impossible... Like, when you're making hard things easier, that's sort of one point of leverage. But true ambition is about making the impossible possible. So now the thing that is missing a little bit in all of our organizations is what is that new conceptual model of what can we build?What was impossible and what can we build? And I'll give you one example of this, right, which is I take great inspiration from sort of the people who were managing the Azure net- network. And they came to the... This was from even last year. You know, we were scaling. You saw that I, I [00:30:00] talked about sort of how we built in the last 15 months more Azure capacity than we built in the first 15 years.I mean, it's crazy. Wild. Yeah. Right? It's pretty wild. And it's the same team. So they saw that and they said, “Bob, this just ain't gonna work if we don't reconceptualize our work.” So they built... Essentially they said, “Our job is not to do Azure networking. Our job is to build the agentic system does, that, that does Azure networking,” right?These are the folks managing the 500-plus fiber operators managing the VAN, right, all over. And fiber operations ultimately is a physical operation. Things get cut, things get, uh, you know, have to be repaired. You know, we have fancy words called DevOps and so on. Basically, emails are coming in and you gotta go respond to them, take care of it.So they built this agentic system. They even have a character for it. It's called Miles, and it sort of does all this stuff, right? They started sort of screaming for more tokens and so on. And so they were saying, “Look, uh, we don't need a headcount. We need tokens in order to be able to [00:31:00] manage, uh, our operation.”That reconceptualization- Mm-hmm ... of what their work is, right? They, they basically took their work and made it meta, right? That meta work is now their new work. Mm-hmm. Right? In the ‘80s, if somebody had come to us and said, “4 billion people are gonna get up in the morning and start typing,” my model would've been, we need 4 billion typists?But we're not doing typing, we're doing knowledge work. So that, to me, I think is it, right, which is whether it's Microsoft or whether it's any organization, is to give ourselves permission to do new types of metacognition, meta work, using these new tools to change the outputs that matter, uh, and then really make the impossible possible.Sarah Guo: So completing that dot or the, the connective tissue across those, I think, is where a lot of the enterprise value will get created.Data Center Build-Out & Community ImpactSarah Guo: Should we talk about data centers? Yeah, please ask. Oh, okay. Well, uh, uh, w- we-- this leads nicely into the data center build-up. I always think, I- I just-- I'm just impressed at the sheer scale of the [00:32:00] build-out from Microsoft, but also everyone else, that this is redefining what it means to be a hyperscaler.And I just feel like that, that, that is at unprecedented scale on finances, uh, on the way you run the company, but also the communities that are, that are impacted. Um, yeah, just talk a bit more about what you're seeing on the ground, like when you visit your- Yeah, I think there are two aspects of it.Satya Nadella: Obviously, the, the build-out is, uh, extraordinary. Um, you know, nothing like this has happened, and it's great to be, uh, one of the participants in it. Uh, but you brought up the other part, right? I think at this point it's clear that unless we as an industry, uh, are very principled about ensuring that the benefits of all the stuff we're talking about are felt in real ways, uh, at the community level, right?Because this is not just a, a campaign, um, right? It has to be real, where people are saying, “Look, this is not ch- changing the prices on energy for me.” In fact, if anything, it's bringing down prices because long term there's going to be a better [00:33:00] grid, there is going to be more energy. Water consumption is, in fact, not sort of, uh...In fact, water is being replenished, right? You gotta really, you know, educate folks on truly what's happening, the cl- uh, the closed loop systems we are building. We have to invest in the training, the jobs, the tax base. In fact, the least talked about stuff is the amount of jobs that get created during construction, after construction.What's the tax base that's there in the community? And, and all this has to be real. Um, and, and if that is the case, then we will have permission. If it is not, we won't have permission. It's as simple as that, right? Which is, uh, we, we... I think we have to take it as an industry pretty seriously. Uh, I think it's good for communities to be skeptical, ask the hard questions, for us to do the hard work, earn that.Um, but at the end of the day, if there's-- if we can really be the produ-- Wait. I've always felt like in human history, if you use a lot of energy but also create a lot of value for society- The story has been fantastic. If you don't [00:34:00] do that, it's not been that great. And this time around, I'm a firm believer that ultimately if you do have a token economy that drives productivity, that drives economic growth, that drives broad spread, um, you know, participation, better health outcomes, um, then I think we'll be in a great place.Sarah Guo: Uh, and that's at least what we all have to be focused on. Yeah. It, it makes me think actually that with all these initiatives that you're doing, might be e- easier to see ROI in the communities first before in enterprise. Yeah. I, I mean, I think both sides. Yeah. In fact, it comes back together. It has to be the people in the communities are going to be employed, are going to be participants, uh, in the real economy, right?Satya Nadella: That's I think the question is. Like, if we- if the broad economy is doing well and the communities are doing well, the dots get connected. It's sort of the market forces are such that we will connect the dots. And that I think is it. Like, you ought to be able to see the evidence. You can't be about o- any one company, uh, but it has to be broad economic growth and broad [00:35:00] ec- you know, community permission.Elad Gil: Yeah. I guess I wanna talk aboutSocietal Impact & Optimism About AIElad Gil: what you're most optimistic about currently or what have you most updated your personal models on regarding societal impact of AI? So you're saying what's the, the, the- What have you updated most on in terms of societal impact of AI? Yeah. I think the, um, the p- the most, um- Critical thing is the first question we even started with, which is we need to tell the story and make it real that everybody has a real shot to participate as a first-class participant in this new economy.Satya Nadella: Right? That's kind of, I think we- in the next 12 months, 18 months, we need a way for people to say, “Oh, wow, I get it.” Right? There's going to be tremendous capability, tremendous amount of infrastructure, but I can see what is going to happen, whether it's the benefits like health outcomes or my ability to create a startup or my ability to run my [00:36:00] local sort of, uh, store more efficiently.It's just happening, and I see that, uh, benefit myself, right? That to me, you know, earning that permission in a path-dependent way, we can't wait. See, the one thing, Eli, that I've now learned is I think the world is gonna be very skeptical of tech and tech companies that say, “Trust us, we've got it. The g- future is gonna be glorious.”Sarah Guo: Uh, you kind of have to deliver tangible benefits. Um, and quite frankly, politicians winning elections, uh, because they have advocated for that. That will be at least my adjustment because without it, um, thinking that somehow... Because it's too important this time around. It's too much of the economy for it not to be the case So one very simple framework I have for, you know, what are, what is gonna be the broad benefit of AI, um, beyond the communities just working in technology, are, are sort of wealth creation- Yepit's [00:37:00] gonna happen in a ton of different companies, startups and large companies. Then you have healthcare. Uh, you, you had amazing demos today. There are companies like Open Evidence. I think that is happening. Um,Education & Future of LearningSarah Guo: education seems like another one that's an- Yep ... obvious good where we haven't seen as much impact as I'd expect.Swyx: Do you have a hypothesis on why that might be, or if it'll come? Yeah, I mean, I think this is where, again, how we think about education, how... You know, recently I met with, uh, the founders of Alpha School and learnt a lot about what they were going and going about, and it's fascinating to listen, uh, to how to even rethink- MmSatya Nadella: uh, what does education really look like. Because I think it's actually very important. Mm. Uh, and I'm not saying anything traditionally being done is less important, right? I was even looking at the, uh... It's fascinating to see. I, I, I forget the which Stanford class it was, uh, the, the Asian guidelines for CS something.Mm. Uh, because you still need people to learn. Uh, like it was an interesting AI class that they were making sure people were learning how to apply softmax appropriately versus saying, “Hey, fix my training run.” Mm-hmm. Uh, so I think learning concepts is important. It's going to [00:38:00] be, uh, critical. But the way we create the incentives, what are the credentials, how we value those credentials, what is the employment opportunity for those credentials?So I think that there's a complete change that has to happen, uh, given the way to get to information, way to educate yourself, way to continuously keep yourself updated has changed so much. So I think interestingly enough, maybe the next big startup and success story could be someone who builds a new university, um, or a new, um, pedagogy even of how to get someone to go through a curriculum and find economic opportunity, uh, that's highly valuable.Well, that has felt, uh, perhaps impossible for a long time, but it's a great note to end on and something that might be possible. It's still possible. Yeah. Thank you, Satya. Thank you so much. Thank you. Yeah. I appreciate it. Thank you all. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

Entre Dev y Ops Podcast
EDyO 105 - K8s e IA con Rael Garcia

Entre Dev y Ops Podcast

Play Episode Listen Later Jun 3, 2026


En el episodio 105 del podcast de Entre Dev y Ops hablaremos de kubernetes e IA con Rael Garcia Blog Entre Dev y Ops - https://www.entredevyops.es Telegram Entre Dev y Ops - https://t.me/entredevyops Twitter Entre Dev y Ops - https://twitter.com/entredevyops LinkedIn Entre Dev y Ops - https://www.linkedin.com/company/entredevyops/ Patreon Entre Dev y Ops - https://www.patreon.com/edyo Amazon Entre Dev y Ops - https://amzn.to/2HrlmRw Enlaces comentados: VIA EPIA - https://en.wikipedia.org/wiki/EPIA Visual Basic for Applications - https://en.wikipedia.org/wiki/Visual_Basic_for_Applications  Hall of Tortured Souls, “Doom” en Excel 95 - https://www.youtube.com/watch?v=JbUk_n3iOgA Roller coaster en excel - https://www.youtube.com/watch?v=IrVA1BBHFHw BSC - https://www.bsc.es/ CUDA - https://en.wikipedia.org/wiki/CUDA Kubernetes SIG Docs - https://kubernetes.io/docs/contribute/participate/ Azure Red Hat OpenShift (ARO)  - https://www.redhat.com/en/technologies/cloud-computing/openshift/azure/get-started CNCF Landscape - https://landscape.cncf.io/ Kubernetes VAP - https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ Postgrado UPC Cloud computing - https://upcschool.upc.edu/esp/estudis/formacio/curs/319400/postgrau-cloud-computing-architecture/ Postgrado UPC Platform engineering - https://upcschool.upc.edu/esp/estudis/formacio/curs/305300/platform-engineering-devops-kubernetes/ Episodio Ciberado - https://www.entredevyops.es/podcasts/podcast-74.html LinkedIn Rael - https://www.linkedin.com/in/rael/?locale=es  Github Rael - https://github.com/raelga 

Analizy Live
Czy BTC zdołuje akcje? Nvidia i nowe cuda. Wilki uciekają z Bollywood

Analizy Live

Play Episode Listen Later Jun 3, 2026 68:11


Czy bitcoin znów ostrzega rynek akcji? W środowy poranek 3 czerwca 2026 Rafał Bogusławski i Robert Stanilewicz sprawdzają, co naprawdę mówią najnowsze sygnały z rynku krypto, technologii, ropy, walut i banków centralnych. Z jednej strony mamy AI, Nvidię i Microsoft, które chcą zmienić rynek komputerów osobistych. Z drugiej – kolejny potencjalnie gigantyczny debiut giełdowy, tym razem Anthropicu. Do tego Fed pod wodzą Kevina Warsha, czyli ryzykowne pomysły na "oddrukowywanie", rozmowy USA – Iran, ropa, inflacja w strefie euro na dawno niewidzianym poziomie i kapitał odpływający z Indii. Czyli przestroga dla innych rynków wschodzących. Zapraszamy!!

Problemy behawioralne psów
Podcast 156: Rozmowa z Amelią Kinkade

Problemy behawioralne psów

Play Episode Listen Later Jun 2, 2026 97:19


“Cuda nie są czymś, w co się wierzy, tylko czymś, czego doświadczamy” - nigdy nie sądziłam, że te słowa z książki Amelii Kinkade “Jak rozmawiać ze zwierzętami”, tak mocno utkną w mojej pamięci… Jak też nigdy nie przypuszczałam, że będę miała okazję spotkać się i porozmawiać z samą Autorką. A jednak…Zapraszam Was serdecznie do wysłuchania naszej rozmowy, która odbyła się w Warszawie pod koniec maja 2026 r. Jeśli nigdy nie słyszeliście o rozmowach ze zwierzętami za pomocą telepatii, o animal komunikatorach i intuicyjnym podejściu do rozumienia zwierząt, rozmowa ta może Was zaskoczyć. Tak samo, jak książka. A jeśli macie już swoje doświadczenia w kontaktach ze zwierzętami za pomocą telepatii lub przemyślenia na ten temat, podzielcie się. Choć sam temat może wzbudzać kontrowersje, dla mnie cenne jest to, że wraz z jego poruszeniem otwiera nam się zupełnie inna perspektywa postrzegania zwierząt i być może  uwrażliwia nas na emocje, potrzeby i warunki, w jakich żyją. 

Cleared Hot
The CIA Tried to Bury It | Rachel Cuda | Ep. 451

Cleared Hot

Play Episode Listen Later Jun 1, 2026 157:22


Rachel Cuda grew up the daughter of a Navy SEAL, raised on Coronado around the teams. She speaks Russian, Ukrainian, and German. She studied at the University of Tennessee, earned a master's from Georgetown, and wrote software at a startup before moving into defense contracting. At the Pentagon she led the data modeling and analytics line for the military's COVID task force. She married a SEAL officer whose grandfather gave the CIA thirty years as a case officer. In February 2022, Rachel Cuda joined the agency's Directorate of Operations. It was the job she'd wanted her whole life. Two weeks after she started, Russia invaded Ukraine, and her languages put her in the middle of it. Six months in, a colleague strangled her with a scarf in a stairwell at headquarters.  Then the agency went to work on her. They told her she couldn't go to the police. They told her she couldn't tell her husband. They warned her that reporting it could put her in prison. So she went to Congress instead. We get into the assault, the run-around, the predators the agency shielded for years, and how one trainee forced the CIA to rewrite its laws in eleven months. Today's Sponsors: Montana Knife Company: https://www.montanaknifecompany.com Brunt: Get $10 Off at BRUNT with code "Clearedhot" at https://www.bruntworkwear.com/clearedhot

Sharks Hockey Digest
Cuda Confidential: Mack Oliphant

Sharks Hockey Digest

Play Episode Listen Later Jun 1, 2026 30:00


#SJBarracuda broadcaster Nick Nollenberger is joined by recently signed defenseman Mack Oliphant to discuss his path to hockey, college career at Holy Cross, beginning his pro career and more.

Cuda Confidential
Cuda Confidential: Mack Oliphant

Cuda Confidential

Play Episode Listen Later Jun 1, 2026 30:00


#SJBarracuda broadcaster Nick Nollenberger is joined by recently signed defenseman Mack Oliphant to discuss his path to hockey, college career at Holy Cross, beginning his pro career and more.

Startup Island TAIWAN Podcast
EP3-40 | 【AI News】Computex / GTC Taipei Open This Week in Taiwan !

Startup Island TAIWAN Podcast

Play Episode Listen Later Jun 1, 2026 41:08


Welcome to SIT Podcast. Just a few hours ago, the eyes of the global tech world turned to the Taipei Music Center, where NVIDIA CEO Jensen Huang delivered a GTC Taipei keynote that sent a jolt through the industry. As we speak, the doors of Computex 2026 have yet to officially open — but NVIDIA has already seized the moment, declaring the arrival of a "new era of PC."In this episode, we take a close look at three defining trends:1. NVIDIA moves into laptop silicon. After more than a decade away, NVIDIA returns to the consumer CPU arena with the N1 and N1X chips. According to supply-chain reports, the high-performance N1X is said to feature a 20-core Arm CPU and Blackwell-architecture graphics, with performance reportedly compared to the desktop-class RTX 5070. More significantly, this could mean the CUDA ecosystem running natively on a Windows-on-Arm laptop for the first time.2. Taiwan — the center of global AI. In his keynote, Huang revealed that NVIDIA's annual spending in Taiwan has grown to roughly $100 billion. The company is also planning an overseas headquarters called "Constellation," reportedly slated to open around 2030 and house some 4,000 employees. From TSMC's manufacturing to Foxconn's assembly, Taiwan has become the heart of what Huang envisions as the AI factory producing computational tokens.3. The rivals respond, and an industry test. Faced with NVIDIA's momentum, Intel has rolled out its Arc G3 chips built for handheld gaming devices, while Qualcomm defends its ground with a $300 entry-level Windows laptop platform. With DRAM and SSD costs climbing, Gartner projects PC prices will rise a notable 17% in 2026 — a real test of what every maker can deliver.歡迎來到 SIT Podcast。就在幾個小時前,全球科技界的目光都聚焦在台北流行音樂中心,NVIDIA 執行長黃仁勳發表了震撼產業的 GTC Taipei 主題演講。此時此刻,Computex 2026 的展覽大門尚未正式開啟,但 NVIDIA 已經先聲奪人,宣告了「PC 新紀元」的到來。在本集節目中,我們將深入解析三大關鍵趨勢:NVIDIA 跨足筆電矽晶片: NVIDIA 睽違十年重回消費型 CPU 戰場,推出 N1 與 N1X 晶片。根據供應鏈報告,高性能的 N1X 據傳搭載 20 核 Arm CPU 與 Blackwell 架構繪圖核心,其性能甚至被拿來與桌機等級的 RTX 5070 相比。更重要的是,這可能代表 CUDA 生態系將首度原生運行於 Windows-on-Arm 筆電。台灣——全球 AI 的中心: 黃仁勳在演講中透露,NVIDIA 每年在台灣的支出已增長至約 1,000 億美元。此外,NVIDIA 正計畫興建名為「Constellation」(星座)的海外總部,預計 2030 年啟用,將容納約 4,000 名員工。從台積電的製造到 Foxconn 的組裝,台灣已成為黃仁勳眼中生產「計算代幣」的 AI 工廠核心。競爭對手的回擊與產業逆風: 面對 NVIDIA 的強勢,Intel 隨即推出專為掌上型遊戲機設計的 Arc G3 晶片,Qualcomm 則以 300 美元的低價 Windows 筆電平台防守市場。然而,在 DRAM 與 SSD 成本飆升的壓力下,Gartner 預測 2026 年 PC 價格將大幅上漲 17%,這對所有廠商來說都是嚴峻的考驗。

The 10Min Trader con Marco Casario
Il Profeta dell'AI viene LICENZIATO: Scommette miliardi CONTRO il settore

The 10Min Trader con Marco Casario

Play Episode Listen Later May 26, 2026 15:10


Un documento ufficiale della SEC rivela una mossa che gela l'entusiasmo dei mercati: l'ex ricercatore di OpenAI Leopold Aschenbrenner ha puntato 8 miliardi di dollari contro i chip di Nvidia, AMD e ASML. In questo video analizziamo i dati contabili e geopolitici dietro questo short massiccio. Attraverso il suo saggio Situational Awareness, scopriremo perché il profeta della Silicon Valley non scommette sulla fine dell'IA, ma sul prossimo grande collo di bottiglia strutturale: l'energia e le infrastrutture. Vediamo cosa cambia per i tuoi investimenti e come gestire il tuo PAC con prudenza strategica. Cosa scoprirai in questo video:

MIASTO Podcast
Zesłanie Ducha Świętego 2026 | Alek Konieczny (24.05.2026)

MIASTO Podcast

Play Episode Listen Later May 25, 2026 24:19


Zesłanie Ducha Świętego 2026     Zapraszamy:         www.SpolecznoscMIASTO.pl Obserwuj nas na:

The 10Min Trader con Marco Casario
[Live] Il Profeta dell'AI viene LICENZIATO: Scommette miliardi CONTRO il settore

The 10Min Trader con Marco Casario

Play Episode Listen Later May 25, 2026 21:31


Un documento ufficiale della SEC rivela una mossa che gela l'entusiasmo dei mercati: l'ex ricercatore di OpenAI Leopold Aschenbrenner ha puntato 8 miliardi di dollari contro i chip di Nvidia, AMD e ASML. In questo video analizziamo i dati contabili e geopolitici dietro questo short massiccio. Attraverso il suo saggio Situational Awareness, scopriremo perché il profeta della Silicon Valley non scommette sulla fine dell'IA, ma sul prossimo grande collo di bottiglia strutturale: l'energia e le infrastrutture. Vediamo cosa cambia per i tuoi investimenti e come gestire il tuo PAC con prudenza strategica. Cosa scoprirai in questo video:

TD Ameritrade Network
Nvidia (NVDA) Eyes Next Phase of AI Leadership

TD Ameritrade Network

Play Episode Listen Later May 20, 2026 7:43


Shaon Baqui says Nvidia (NVDA) faces a key test as competitors like Amazon (AMZN) and Alphabet (GOOGL) push custom silicon. He highlights strong demand, a growing product roadmap, and Nvidia's CUDA moat as key advantages. He adds that supply constraints and cost efficiency will shape the next phase of competition.======== Schwab Network ========Empowering every investor and trader, every market day.Subscribe to the Market Minute newsletter - https://schwabnetwork.com/subscribeDownload the iOS app - https://apps.apple.com/us/app/schwab-network/id1460719185Download the Amazon Fire Tv App - https://www.amazon.com/TD-Ameritrade-Network/dp/B07KRD76C7Watch on Sling - https://watch.sling.com/1/asset/191928615bd8d47686f94682aefaa007/watchWatch on Vizio - https://www.vizio.com/en/watchfreeplus-exploreWatch on DistroTV - https://www.distro.tv/live/schwab-network/Follow us on X – https://twitter.com/schwabnetworkFollow us on Facebook – https://www.facebook.com/schwabnetworkFollow us on LinkedIn - https://www.linkedin.com/company/schwab-network/About Schwab Network - https://schwabnetwork.com/about

How I Built This with Guy Raz
NVIDIA: Jensen Huang. From near collapse to becoming the world's biggest company

How I Built This with Guy Raz

Play Episode Listen Later May 18, 2026 67:18


NVIDIA is one of the most valuable companies in human history. Its chips run the AI systems transforming everything from entertainment to warfare. But for years, almost nobody believed in co-founder Jensen Huang's vision. Jensen spent nearly a decade pouring billions into a technology called CUDA, long before AI made it profitable.In this deeply personal conversation, Jensen tells Guy why NVIDIA's very first chip was a catastrophic failure … and how at one point, the company was 30 days away from going out of business. Jensen also explains why he thinks fears about AI are overblown, and why he believes the next generation will have more opportunity — not less — because of AI.What You'll Learn:Why NVIDIA nearly collapsed before becoming an AI giantHow researchers sparked the AI boom using NVIDIA gaming chipsHow to lead through uncertainty when a huge bet hasn't yet paid offHow Jensen approaches hard decisions like an engineerWe're “doing ourselves a disservice” by being afraid: Jensen on AI and job lossHow Jensen defends his demanding management styleWhy past failures still haunt himKey Moments From the Interview:00:07:51 — Jensen Huang's childhood at an unusual Kentucky boarding school00:14:50 — Why Jensen left a stable career to help start NVIDIA00:17:14 — NVIDIA's first failure: the NV1 disaster00:19:51 — The desperate trip to Japan that gave the company a lifeline00:23:11 — “The only idea we had” for prototyping: the emulator Hail Mary00:30:53 — The book that shaped Jensen's thinking about innovation00:35:04 — Why NVIDIA kept investing in CUDA while Wall Street lost faith00:41:38 — The moment AI researchers discovered the power of NVIDIA's chips 00:53:17 — Jensen on fear of job loss from AI, and why America risks falling behind01:01:56 — Knowing what he knows now, would he do it again? Yes — and noThis episode was researched and produced by Alex Cheng with music by Ramtin Arablouei. It was edited by Neva Grant. Our engineers were Patrick Murray and Robert Rodriguez.Follow How I Built This:Instagram → @howibuiltthisX → @HowIBuiltThisFacebook → How I Built ThisFollow Guy Raz:Instagram → @guy.razYoutube → guy_razX → @guyrazSubstack → guyraz.substack.comWebsite → guyraz.comSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Hacker News Recap
May 11th, 2026 | I'm going back to writing code by hand

Hacker News Recap

Play Episode Listen Later May 12, 2026 15:21


This is a recap of the top 10 posts on Hacker News on May 11, 2026. This podcast was generated by wondercraft.ai (00:30): I'm going back to writing code by handOriginal post: https://news.ycombinator.com/item?id=48090029&utm_source=wondercraft_ai(01:57): Postmortem: TanStack npm supply-chain compromiseOriginal post: https://news.ycombinator.com/item?id=48100706&utm_source=wondercraft_ai(03:25): Mythos Finds a Curl VulnerabilityOriginal post: https://news.ycombinator.com/item?id=48091737&utm_source=wondercraft_ai(04:52): Ratty – A terminal emulator with inline 3D graphicsOriginal post: https://news.ycombinator.com/item?id=48093100&utm_source=wondercraft_ai(06:20): Gmail registration now requires scanning a QR code and sending a text messageOriginal post: https://news.ycombinator.com/item?id=48092028&utm_source=wondercraft_ai(07:48): GitLab announces workforce reduction and end of their CREDIT valuesOriginal post: https://news.ycombinator.com/item?id=48100500&utm_source=wondercraft_ai(09:15): Software engineering may no longer be a lifetime careerOriginal post: https://news.ycombinator.com/item?id=48095550&utm_source=wondercraft_ai(10:43): CUDA-oxide: Nvidia's official Rust to CUDA compilerOriginal post: https://news.ycombinator.com/item?id=48096692&utm_source=wondercraft_ai(12:10): The greatest shot in television: James Burke had one chance to nail this scene (2024)Original post: https://news.ycombinator.com/item?id=48090521&utm_source=wondercraft_ai(13:38): If AI writes your code, why use Python?Original post: https://news.ycombinator.com/item?id=48100433&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

LINUX Unplugged
665: Patch Me If You Can

LINUX Unplugged

Play Episode Listen Later May 4, 2026 80:41 Transcription Available


We dig into the Copy Fail vulnerability and test a proof-of-concept against our own box. Plus, Jon Seager, VP of Engineering at Canonical joins us, and we kick off the BSD Challenge!Sponsored By:Jupiter Party Annual Membership: Put your support on automatic with our annual plan, and get one month of membership for free!Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love.Support LINUX UnpluggedLinks:

PULS BIZNESU do słuchania
Cuda i Konstytucja 1791. PB BRIEF

PULS BIZNESU do słuchania

Play Episode Listen Later May 3, 2026 21:48


To będzie wyjątkowy odcinek PB Brief. Zamiast opisu bieżących danych, cofniemy się do momentu, w którym państwo znikało z mapy — a jednocześnie jego obywatele robili rzeczy niezwykłe. Bo końcówka I Rzeczypospolitej to nie tylko rozbiory i zdrada, ale też wysiłek, który pozwolił zachować coś znacznie ważniejszego niż granice: pamięć, język i ideę państwa. Opowiem dziś o ludziach takich jak Tadeusz Czacki, Hugo Kołłątaj czy król Stanisław August — o ich decyzjach, które w chwili upadku miały sens dłuższy niż jedno pokolenie. To historia o końcu, który okazał się początkiem.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 27, 2026 72:21


From building Applied Intuition from YC-era autonomy tooling into a $15B physical AI company, Qasar Younis and Peter Ludwig have spent the last decade living through the full arc of autonomy: from simulation and data infrastructure for robotaxi companies, to operating systems for safety-critical machines, to deploying AI onto cars, trucks, mining equipment, construction vehicles, agriculture, defense systems, and driverless L4 trucks running in Japan today. They join us to explain why “physical AI” is not just LLMs on wheels, why the real bottleneck is no longer model intelligence but deployment onto constrained hardware, and why the future of autonomy may look less like one-off demos and more like Android for every moving machine.We discuss:* Applied Intuition's mission: building physical AI for a safer, more prosperous world, powering cars, trucks, construction and mining equipment, agriculture, defense, and other moving machines* Why physical AI is different from screen-based AI: learned systems can make mistakes in chat or coding, but safety-critical machines like driverless trucks, autonomous vehicles, and robots need much higher reliability* The evolution from autonomy tooling to a broad physical AI platform: starting with simulation and data infrastructure for robotaxi companies, then expanding into 30+ products across simulation, operating systems, autonomy, and AI models* Why tooling companies came back into fashion: Qasar on why developer tooling looked unfashionable in 2016, why Applied Intuition still bet on it, and how the AI boom made workflows and tools central again* The three core buckets of Applied Intuition's technology: simulation and RL infrastructure, true operating systems for vehicles and machines, and fundamental AI models for autonomy and world understanding* Why vehicles need a real AI operating system: real-time control, sensor streaming, latency, memory management, fail-safes, reliable updates, and why “bricking a car” is much worse than bricking an iPad* Physical machines as “phones before Android and iOS”: Peter explains why today's vehicle and machine software stack is fragmented across many operating systems, and why Applied Intuition wants to consolidate the platform layer* Coding agents inside Applied Intuition: Cursor, Claude Code, internal adoption leaderboards, and how AI tools are changing engineering workflows even in embedded systems and safety-critical software* Verification and validation for physical AI: why evals get harder as models improve, how end-to-end autonomy changes simulation requirements, and why neural simulation has to be fast and cheap enough to make RL practical* From deterministic tests to statistical safety: why autonomy validation is shifting from binary pass/fail requirements toward “how many nines” of reliability and mean time between failures* Cruise, Waymo, and public trust: Qasar and Peter discuss why autonomy failures are not just technical issues, how companies interact with regulators, and why Waymo is setting a high bar for the industry* Simulation vs. reality: why no simulator perfectly represents the real world, how sim-to-real validation works, and why real-world testing will never disappear* World models for physical AI: hydroplaning, construction equipment, visual cues, cause-and-effect learning, and where world models help versus where they are not enough* Onboard vs. offboard AI: why data-center models can be huge and slow, but onboard vehicle models need millisecond-level latency, low power, small size, and distillation-like efficiency* Why physical AI is not constrained by model intelligence alone: the hard part is deploying models onto real hardware, under safety, latency, power, cost, and reliability constraints* Legacy autonomy vs. intelligent autonomy: RTK GPS in mining and agriculture, why hand-coded path-following worked for decades, and why modern systems need perception and dynamic intelligence* Planning for physical systems: how “plan mode” applies to robotaxis, mining, defense, and multi-step physical tasks where actions change the state of the world* Why robotics demos are not production: the brittle last 1%, humanoid reliability, DARPA Grand Challenge-style prize policy, and the advanced engineering gap between research and deployment* Applied Intuition's hard-earned lessons: after nearly a decade, Peter says they can look at a robotics demo and predict the next 20 problems the company will hit* Qasar's advice to founders: constrain the commercial problem, avoid copying mature-company strategies too early, and remember that compounding technology only matters if you survive long enough to see it compound* Why 2014 YC advice may not apply in 2026: capital markets, AI company dynamics, and the difference between building in stealth with a deep network versus building as a new founder today* What Applied is hiring for: operating systems, autonomy, dev tooling, model performance, evals, safety-critical systems, hardware/software boundaries, and engineers with deep curiosity about how things workApplied Intuition:* YouTube: https://www.youtube.com/@AppliedIntuitionInc* X: https://x.com/AppliedInt* LinkedIn: https://www.linkedin.com/company/applied-intuition-incQasar Younis:* X: https://x.com/qasar* LinkedIn: https://www.linkedin.com/in/qasar/Peter Ludwig:* LinkedIn: https://www.linkedin.com/in/peterwludwig/Timestamps00:00:00 Introduction: Applied Intuition, Physical AI, and 10 Years of Building00:01:37 Physical AI vs. Screen AI: Why Safety-Critical Changes Everything00:02:51 The Origin Story: Tooling, YC, and the Scale AI Comparison00:05:41 The Three Buckets: Simulation, Operating Systems, and Autonomy Models00:11:10 Hardware, Sensors, and the LiDAR Question00:14:26 The Operating System Layer: Why Vehicles Are Like Pre-Android Phones00:19:13 Customers, Licensing, and the Better-Together Stack00:21:19 AI Coding Adoption: Cursor, Claude Code, and the Bimodal Engineer00:26:41 Verifiable Rewards, Evals, and Neural Simulation00:31:04 Statistical Validation, Regulators, and the Cruise Lesson00:40:25 World Models, Hydroplaning, and Cause-Effect Learning00:43:34 Onboard vs. Offboard: Latency, Embedded ML, and Distillation00:50:57 Plan Mode for Physical Systems and Next-Token Prediction Universally00:53:04 Productionization: The 20 Problems Every Robotics Demo Will Hit00:58:00 Founder Advice: Constraints, Compounding Tech, and Mature-Company Mimicry01:05:41 Hiring Philosophy: Hardware/Software Boundary and Engineering Mindset01:08:50 General Motors Institute, Education, and the Curiosity MindsetTranscriptIntroduction: Applied Intuition, Physical AI, and 10 Years of BuildingAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space.Swyx [00:00:10]: And today we're very honored to have the founders of Applied Intuition, Qasar and Peter. Welcome.Qasar [00:00:17]: You guys really know how to turn it on to podcast mode. That was, you guys are real pros at this.Qasar [00:00:23]: They were just joking around right before this, and then they flipped it pretty quick.Alessio [00:00:29]: Oh, yeah, it's good to have you guys. Maybe you just wanna introduce yourself so people know the voice on the mic and they'll know what they're hearing.Peter [00:00:33]: Oh, sure. Yeah, I'm Peter Ludwig. I'm the co-founder and CTO of Applied Intuition.Qasar [00:00:38]: And my name is Qasar Younis. I am the CEO and co-founder with Peter.Alessio [00:00:42]: Nice. Can you guys give the high-level overview of what Applied Intuition is? And I was reading through some of the Congress files, when you went out there, Peter, and eighteen of the top twenty global non-Chinese automakers, you two guys, you have customers in agriculture, defense, construction. I think most people have heard of Applied Intuition tied to YC when it was first started, and then you were kinda in stealth for a long time, so maybe just give people the high-level overview of what it is today, and then we'll dive into the different pieces.Peter [00:01:10]: Yeah. So at Applied Intuition, our mission is to build physical AI for a safer, more prosperous world. And so we work on physical AI for all different types of moving systems, everything from cars to trucks to construction and mining equipment, to defense technologies. And we're a true technology company, so we build and sell the technology, and we sell it to the companies that make the machines. We sell it to the government, really anyone that wants to buy a technology to make machines smart.Physical AI vs. Screen AI: Why Safety-Critical Changes EverythingQasar [00:01:38]: Yeah. And I think in the broader AI landscape, a lot of the focus, rightfully so in the last, three years has been on large language models, and so everything fits in a screen. Like, whether it's code complete products or things like that. And what's different about us is we're deploying intelligence onto a lot of things that don't have screens. they're physical machines. There are sometimes screens within the cabin or for example of a car or a truck or something like that, but most of the value we provide is putting intelligence that is in safety critical environments. So that those two words are really important because learn systems can make mistakes if you're asking for, like, some, so something like, “Tell me about these podcast hostsQasar [00:02:28]: that I'm about to go meet.” But you can't do that obviously when you run, like, as an example, we run driverless trucks in Japan right now, as we speak. We can't have errors. Those are L4 trucks. Yeah.Alessio [00:02:40]: Yeah. Was that always the mission? I remember initially, I think people put you and Scale AI very similarly for some things about being kinda like on the data infrastructure side of things. What was the evolution of the company?The Origin Story: Tooling, YC, and the Scale AI ComparisonPeter [00:02:51]: Well, from the very beginning, we always wanted to, really be a technology company that helped generally push forward the industrial sector. And so we started off working in autonomy. Our very first customers were robotaxi companies. And we started off doing a lot of work in simulation and data infrastructure. And then over the years, we've expanded our portfolios. Now we have, over thirty products, and it's a pretty broad technology play within the landscape of physical AI.Qasar [00:03:19]: Yeah, I think the Scale reason is because we're all YC Universe companies. But it was a very different company. Scale, was, is more of a services company, data labeling company fundamentally. We started and still are, do a lot of tooling. So like, you think developer tooling is now in vogue again, thanks to the AI boom. But honestly, ten years ago, it was out of vogue. It w Like, doing a tooling company in 2016, 2017 was not, like, the thing to do because, I don't know if you remember, the VCs generally, their views was that toolings are They're just workflows, and workflows ultimately are not really interesting. And we've gone and come, full circle with that. But when we started the company, our kind of it's kinda like in the periphery of what the company wants to be. It was like, from our earliest days, like, we wanna deploy software on physical machines, like on cars and on trucks and things like that. And obviously, we didn't know that the transformer boom was gonna happen. We didn't know that autonomy systems would become end-to-end. Those things we didn't know. And why that's important when autonomy systems become end-to-end, it is just now those models can be generalized to, multiple form factors. And so back nine, ten years ago, tooling was a great way, and still is a great way to, build the technology and sell technology to our end customers, a lot of them who wanna build this stuff themselves. And so we just offer like a spectrum of solutions from you can just use like one part of a development suite of tools all the way to buying the full thing. The way to think about the company, or at least the way we think about the company is, as Peter said, a technology provider. It's kinda like, what NVIDIA does or what an AMD, but we just don't do chips.Qasar [00:05:06]: We don't do silicon. But we're a technology provider fundamentally. And I think even, we used to joke when we started the company, like, we're not the guys to build, like, Instagram. Like that was just towards That's not our That's just not us in a most fundamental way. IAlessio [00:05:20]: You have thoughts.Qasar [00:05:21]: Yes.Qasar [00:05:22]: Well, it's, it's I mean, I think it's just like what And I mean, we worked on Maps and stuff, Google Maps. Consumer products are extremely difficult for a lot of different reasons. It just, I think doesn't scratch the itch. I think we're like Michigan guys who are kind of more of that traditional engineering kind of a realm, or lineage. we used to jokeThe Three Buckets: Simulation, Operating Systems, and Autonomy ModelsPeter [00:05:41]: I gotta say, though, what was clear ten years ago was that there was so much more that was possible with software and AI in vehiclesPeter [00:05:47]: and that was generally the space that we started in ten years ago.Peter [00:05:51]: And the precise path that we've taken over the years, I think we've been strategic, and we've adjusted to make sure that we're actually building stuff that's valuable to the market. And like, the technology has changed so much. Like our own technology stack has completely changed, I would say, roughly every two years. And so now we've probably done, let's say, four complete evolutions of our own technology stack. And I sort of see that cadence roughly keeping up.Peter [00:06:13]: And so the way even we think about engineering is almost on this two-year horizon, we're preparing ourselves that, hey, like, we wanna invest the appropriate amount, but then also be very dynamic as the research gets published and as our research team figures out new advancements and adapting to that.Qasar [00:06:27]: Yeah. One thing that has been consistent is the type of people we've, we've recruited. It's engineers who are fall into the sometimes very traditional, like, GoogleQasar [00:06:38]: -gen suite, but way different from, other companies. We are hiring folks who really know the intersection of hardware and software, who know really low-level systems. Obviously, traditional ML researchers and folks who've, actually, put ML systems into production. That's been pretty consistent. I think that, like, you look at the mix of our engineering, eighty-three percent of the company is engineering, so it's, like, a giant list.Qasar [00:07:05]: A lot of engineers.Alessio [00:07:06]: Which, by the way, a thousand engineersQasar [00:07:07]: Yeah. A thousand engineers.Alessio [00:07:08]: that's on your website, so I imagine it's up to date.Qasar [00:07:11]: It is, it is up to date, yes. Yes.Alessio [00:07:12]: okay. And then forty-plus founders.Qasar [00:07:15]: Yeah. We would tend to also, This was more luck than strategy. But we've recruited a lot of ex-founders. It's been a great place for founders, YC and non, ‘cause obviously I know a lot of the YC folks. It's kind of like we recruit a lot of Google people.Qasar [00:07:33]: For them to exercise both their technical and non-technical skills because, we're, we're, we're on the applied side. We have a research team that we do fundamental research, we publish, and we've, we've had great traction there. But fundamentally, the business wants to take this intelligence and deploy it into production and there's, like, a certain type of person that's more interested in that.Alessio [00:07:54]: Yeah. You mentioned the tech stack, Peter, so I just wanted to give you some rein to just go into it. I'm interested in where Wayve Nutrition, starts and ends in some sense, what won't you do? What, do you do that's common among all the verticals that you cover?Peter [00:08:10]: There's a few buckets of work that we do, and we've been at this for almost ten years now, so the technology's pretty broad. But we got startedQasar [00:08:17]: Yeah, with a thousand engineers, like, you could work on lots of things.Peter [00:08:19]: There's lots of stuff, yeah, espe-especially with AI tools to help.Peter [00:08:22]: So we got our start in simulation and simulation tooling and infrastructure. And so generally, if you're trying to build a very complex software system that involves moving machines, you need to test that, and the best way to test it is it's a combination of virtual developments, a simulation, and then also obviously real world testing.Peter [00:08:39]: And then there's a very careful process of that correlation between the simulation results and the real world results and ensuring that the simulator is in fact accurate to that. Simulation's a very deep topic.Peter [00:08:49]: We have a whole suite of products in that, and we could talk for many hours about that specifically. But that is one part of what we do as a company. Reinforcement learning as a subpart of that is also super critical. I think a lot of the a lot of the best advancements happening in a lot of these AI systems right now in some way relate to reinforcement learning, and with now we have lots of compute, and you can do tons of interesting things for reinforcement learning. The second bucket of work that we do is on operating systems technology. true operating systems. Like, think about, schedulers and memory management and middleware and message passing and highly reliable networking and data links. Like, the reality is, if you want to deploy AI onto vehicles, you need a really good operating system. And when we were getting deeper into that space, there wasn't really anything that we were happy with.Peter [00:09:39]: Like, things existed, absolutely, and we were using what was available in the market, and as an engineering organization, we roughly realized these things aren't great. We think we can do this better, and so let's, let's build something. And that was then the that was the moment of inspiration that started our operating systems business, which is now a very real business for us. And in order to write and run great AI, you need a great operating system, and so that-that's what got us into that. And then the third bucket that we work on, it's, it's true fundamental AI technology. Models, we do a lot of work in, as mentioned, the foundational research, but then the also the world models and the actual autonomy models that are running on these physical machines, and that's across cars, trucks, mining, construction, agriculture, and defense, and so that's both land, air, and sea.Qasar [00:10:31]: And also, a smaller subsector of that third bucket is the interaction of humans with those machines.Qasar [00:10:38]: So that's a multimodal, experience. Historically, if you're moving a dirt mover or any of these machines, there are, like, buttons you press, whether they're actual physical tactile buttons or something like a touch screen. That's just That fundamentally is changing to where you're just talking to the machine and the machine and you're teaming with the machine.Alessio [00:10:58]: Voice?Qasar [00:10:59]: Yeah, voice, absolutely, yeah.Alessio [00:11:00]: Oh.Qasar [00:11:00]: And also the machine just being aware of who is in the cabin, what their state is. you can think from a safety systems perspective, the most simple version of this is, like, the driver is tired, right? They're, they're if you get those alerts when you're driving your car and saysHardware, Sensors, and the LiDAR QuestionQasar [00:11:15]: -maybe take a coffee break, that take that times, a couple of order of magnitudes up. But this concept of teaming man and machine is important. When you think about running agents or just running, different instances of, Claude and doing work for you in the background, you can take that analogy out, almost copy and paste and put it into, like, a farm, where you have a farmer who's running a number of machines. So where they interact with the machine is where there's maybe a critical decision or a disengagement or something like that, but generally speaking, the agent on the physical machine is running and making decisions on the behalf of the farmer until there's something maybe critical. And that's also what we work on. So that's not pure autonomy. It's a little bit of a mix, but it falls under, autonomy. In the automotive sense, that's typically defined in SAE levels as an L2++ systemQasar [00:12:05]: -with a human in the loop. But just take that idea, to other verticals.Alessio [00:12:09]: Yeah. You've not mentioned hardware at all, like sensors or obviously we you mentioned you don't do chips. I think even in AV there's, like, a big, cameras versus lidars. Like, what are, like, in your space maybe some of those design decisions that you made, and are they driven by the OEM's ability to put things on the machinery? And like, how much influence do you guys have on co-designing those?Peter [00:12:32]: Yeah. So we don't make sensors. Like, we're, we're not a manufacturer. Obviously, we use a lot of sensors in our autonomy products. in terms of what actually goes on the vehicles, we have a preferred set of sensors that we, let's say fully support, and then our customers, they can sort of choose from those. And obviously if there's a very strong opinion on supporting something else, we'll add that to the platform as well. And the lidar question is at this point sort of the age-old,Peter [00:12:59]: topic in autonomy, and the state of the industry right now is lidar is hands down a useful sensor, specifically for data collection and the R&D phase of autonomy development. if you see, for example, a Tesla R&D vehicle, it actually has lidar on itPeter [00:13:17]: to this day, right? In the Bay Area we see these. you'll see, like, Model Ys or Cybercab that have lidars on them just driving around. So it's, it's useful because it gives you per pixel depth information. So if you can pair a lidar with a camerand you can say that, well, this camera's looking this direction, this lidar's looking this direction, and now for each pixel of the camera I can see how far away is that pixel. you can actually then use that as a part of your model training, and then the that depth information then becomes a learned, a learned state of the camera data. And then when you're doing the production system, you can now remove the lidarPeter [00:13:52]: and now you can actually get depth with just the camera. And so that difference between, like, a highly sensored R&D vehicle and then the down-costed production vehicle, we use that across our whole portfolio of products. And of course the end goal is you want super low cost and super reliable.Peter [00:14:08]: And then in certain use cases you have some more, bespoke things. Like in defense as an example, you do things at night oftentimes, and so you care about sensors like infrared, more so than And you don't, you don't wanna be putting energy out, so you don't wanna use lidar or radar.Peter [00:14:23]: but you still need to be able to see at nighttime. So yeah, we work the whole gamut.The Operating System Layer: Why Vehicles Are Like Pre-Android PhonesAlessio [00:14:27]: Cool. So that's kinda like on the hardware level. Then on the OS level, how does that look like? What is, like, unique? my drive- I drive a Tesla. Whenever I drive some other car that has a screen, it always sucks.Alessio [00:14:38]: It's on, like, cheap Android tablet. It's like, it's laggy and all of that. What does the OS of, like, the autonomy future look like?Peter [00:14:46]: When most people, it's really what you just described. When you think about operating system in a vehicle, you're thinking about the HMI, right? The human machine interface, and absolutely that's a an important part of it, but that's actually only one thin layer on top. So when we talk about operating systems for, like, AI in vehicles, there's many layers that go deep into the CPU critical realm and embedded systems, and you're talking about the real time control ofPeter [00:15:13]: let's say the electric motors or the engine and the actuators, and you have different redundancies for different, let's say, the steering actuation in the vehicle. And all of these things, need very core support in the in the operating system. And then of course for autonomy you have real time sensor data that's streaming in, and the latencies there are really important, right? If you try to Imagine you try to run Microsoft WindowsPeter [00:15:35]: like streaming your sensor data in or controlling the vehicle. Like, the latencies are gonna be absurd. Like, you can never do that. And so what's special about what we do is we really have this system level thinking, right? So we're looking at, we care about every performance characteristics of the entire system, and then we also, because we're doing a lot of the software or all of that software, we can fine-tune and control all of those things. So we can very carefully tune in the latencies for every aspect of the system. We can carefully tune in the memory management. We can have the right, fail-safes and fallbacks, for different things. ‘Cause you have to account for what if, what if there is a critical failure? What if there's a cosmic ray that flipsPeter [00:16:14]: a bit in the middle of the processor that causes some, malfunction? And you have to have a fail-safe to all of that, and so the core operating system is a part of that. And then the one last thing, which is a lot less exciting but is, actually a very big topic, is reliability of updates.Peter [00:16:30]: so the I have a Tesla and you get updates fairly frequently, right?Peter [00:16:36]: Once a month. Most companies that are making vehiclesPeter [00:16:40]: are basically never doing updates, and they're And even if they are doing updates, they're usually only updating maybe one module. Maybe they're updating the HMI module. But they're not able to update, let's say, the CPU critical parts of the system.Peter [00:16:51]: You have to go into the dealer for that. And so with our operating system now we can actually enable highly reliable updates of any system in the vehicle, and that's way easier said than done. Like, there's lots of technical, technically deep stuff, in the tech stack to do that in a way that you're not going to accidentally brick a vehicle.Peter [00:17:08]: And right? If, imagine yourAlessio [00:17:10]: That would be bad.Alessio [00:17:11]: Bad.Peter [00:17:11]: Bricking a car is a very expensivePeter [00:17:13]: and honestly, like across the industry maybe one of the most just pure impactful things that we've done is we've just, we're, we're now enabling the industry to actually do software updates.Alessio [00:17:22]: Just to clarify as well, who is the customer for this? Like, I assume a lot of hardware manufacturers have their own firmware, and I'm sure some of them would just have you write it for them because you're experts. And others would have their own. Like, who pays for this? Who invites you into the house? Is it, is it the end user, or is it, is it the manufacturer?Peter [00:17:41]: Yeah. So let me make an analogy firstly on the on the fragmentation of software. So physical machines today are more akin to the state of the phone market before Android and iOS existed, right? So I worked on Android at Google by the way many years ago, and part of the reason that Larry at Google decided to get into Android was they wanted to run Google products on a bunch of phones, and they bought all of these phones from the industry, and it turned out they had like 50 different operating systems on these phones. And it was virtually impossiblePeter [00:18:17]: for Google to make their app run on all 50 devices equally well. And so the solution was, well, actually what if, what if they created-A really great operating system and made it attractive to all of these phone makers, and that was sort of the genesis for what Android was and why Android existed. It was a way for Google to get their products onto really wide diversity of devices. The state of the physical, industry right now, it's a little bit like that. Like, there's yes, these companies have firmware, but they have so many different operating systems, it's so fragmented, and to actually get a modern AI application to run on these vehicles, you actually, you first have to consolidate the operating system, and so that's, that's why we've done that. And then, your specific question was who are our customers? It's, it's, generally it's the companies that are making these machines.Peter [00:19:06]: And we're, we're, we're selling our technology to them to really simplify the architecture and then enable these AI applications to run on them.Customers, Licensing, and the Better-Together StackSwyx [00:19:13]: How much is reusable across? Like, do you have, like, one OS that is just configured for everything, or is there some more customization that is needed?Peter [00:19:22]: Yeah, highly reusable. So the fundamental technology is quite universal, right? So things that we do have to think about though are, like, chipset support. And so if you're, if you're coding, let's say, an LLM and you have start with an assumption that, “Hey, oh, I'm gonna, I'm gonna use CUDA, and I'm gonna run this, on an NVIDIA chip,” then you don't really have to think about the hardware in that sense. Like, you're just, “Okay, I'm just I'm in the CUDA/NVIDIA ecosystem, and I'm, I'm going to use that.” But the hardware, especially in safety critical systems, it's a lot more diverse. There's not one or one or two players. There's a bunch of different chipsets that we have to support. And so our operating system doesn't just run on, like, the equivalent of X86. It has to, it has to run on a number of different architectures from chips from a bunch of different companies. But again, we've been working on this for a long time now, so we have, we have support for all of those chipsets. And then when you want to then run the AI applications, we can then do that reliably across now a variety of providers.Qasar [00:20:19]: And I think that is, like, heavily inspired by Android, right? Android has a huge suite of testing and it's a reliable operating system that runs on thousands of devices. And we think we can, we can do the same in all these physical moving machines, with the difference that we're really in a safety critical realm. Android isn't.Alessio [00:20:40]: So on Android, I don't need to use Gmail, I can use Superhuman. Like, what about your machinery? Like, can people bring somebody else's automation to it, or is it kinda like all-in-one?Qasar [00:20:50]: You have to use us. No. Yeah. we're If, Yeah. Yeah, it's totally open. Yeah.Peter [00:20:56]: Yeah. our philosophy is that we are a technology company, and so we license our technology to customers to use how they want. And so if a customer wants to If they wanna license our autonomy tech and our operating system, then great, we'll license those. If they just wanna license the operating system and then use different autonomy tech, that's fine also, and we have great documentation andSwyx [00:21:17]: Or if they wanna use developer tooling.Peter [00:21:18]: Yeah, exactly.AI Coding Adoption: Cursor, Claude Code, and the Bimodal EngineerSwyx [00:21:19]: It's, like, a better together if, obviously, if you, if they work together. Is it all C++ I assume is with different compile targets?Peter [00:21:27]: We use a lot of C++.Peter [00:21:28]: Rust is sort of a hot, the new hot kid on the blockPeter [00:21:32]: for a bunch of things as well. But yeah, the lower level you get, especially when you get to real-time constraints, you hit C++ at some point, and at some point maybe you work your way into assembly when needed.Swyx [00:21:44]: Oh, damn.Alessio [00:21:46]: I'm curious about the coding agent adoption, just, like, since you're mentioning more esoteric languages. Like, what's the adoption internally? What have you learned?Peter [00:21:55]: Yeah. We use everything. So Cursor was, I think the hottest tool in the company for a good while. Now Claude Code, I think has taken the reign on that. We have a internal leader, leaderboard that we use just to sort of encourage adoptionPeter [00:22:09]: with-within the company. And yeah, it's, they're phenomenally useful. it's, Honestly, we take inspiration from some of those tools also in how we're adapting some of that mindset of thinking to the physical realm. Like if it's so easy to build an app for this or that thing that lives just on a screen, we can We're taking now a lot of the same ideas and applying that to, “Okay, well, if you wanted a physical machine to do something, how easy can we make that, using our own tooling and platform as well?”Alessio [00:22:40]: Are you changing any of, like, the OS architecture, kinda like the way you expose services to, like, be more AI friendly or?Peter [00:22:48]: Yeah, absolutely. The in the early days of our tools infrastructure work, it was a lot about, You had engineers that were experts in certain topics, but the things that you're dealing with, they're oftentimes more mathematical or more abstract, where actually GUI tools are very useful for certain things. Like as an example, we have a product we call Sensor Studio, which is, it helps you design the sensor suite for your autonomous vehicle, whether, again, it could be a car, it could be a drone, could be a mining equipment, could be a robot. And you place sensors in different places. You There's different, There's a library. You can understand what are the trade-offs that you're making in the design of that system, and that was, like, a very, a very GUI intensive, thing ‘cause it's a little more like a CAD tool in that senseSwyx [00:23:37]: YepPeter [00:23:37]: if you've seen CAD tools. Nowadays, though, right, we expose all of the underlying APIs for that and now using, AI agents, you can actually configure a sensor suite with just text and likely reach a better result than you could've through the GUI in the past, and we're taking that thinking now through the whole product portfolio.Swyx [00:23:57]: Another thing I was thinking about is just in terms of, like, AI, adoption, does it change your hiring at least a little bit, or how do you, how do you sort of manage engineers, differently?Peter [00:24:08]: Yeah. absolutely, it does. we, I think like every company in the Valley right now, are evolving our hiring practicesPeter [00:24:16]: because the skills required to be effective are changing so fast, right? you used to really select for just rote implementation ability and now it is more the AI engineer skill set, right? Where it's like, yeah, how to implement, but actually-Just banging out code is no longer the core job, right? It's, it's actually knowing what questions to ask, knowing how to tie, how to tie together these different AI tools. And so the interviews that we give now I think are way harder than they've ever been.Peter [00:24:46]: But we also allow, right, selective use of AI tools to solve the problems. And I think in that you start to see more of a bimodal distribution of engineers, right? You start to see like wow, there's, there's this subset of people that they really get it. Like they're, they're all in and they've, they've clearly invested the hours needed to learn these tools and how to be effective.Peter [00:25:09]: And then there's sort of the group of people that haven't done that, and that the productivity gap is just enormous. And so we're, we're trying to obviously select for the people that are really into this.Qasar [00:25:20]: I first wrote the my AI engineer piece three years ago, and when I first wrote about it, I was like, “Actually, not everyone should be an AI engineer,” ‘cause I think there's a there's an extremist stance where well, every software is an engineer is an AI engineer. And my actual example of people who should not be adopting AI was embedded systems and operating systems, and database people. Are they adopting AI?Peter [00:25:41]: I think it's the classic bitter lesson, topic, which is the Six months ago I would've said the same thing, but it's, it's becoming super useful for every domain.Qasar [00:25:53]: I'm sure.Peter [00:25:54]: Right? Like,Peter [00:25:56]: there was, I think six months ago, or maybe a year ago, if you tried to use, let's say the latest Claude model for writing shaders, GPU shaders, the results were probably underwhelming. And if you use the latest model now to do that kind of task, you're a little bit blown away, like, “Wow, that actually worked. That's amazing.” And we see the same thing in the embedded realm. No question though, especially when you get into safety critical systems, the human validation isPeter [00:26:25]: is 100% key. Like I You're not gonna trust your life to a an AI written software that's, that's not been very carefully, checked by humans. And so I think now the really the challenge is about that appropriate level of human validation for these safety critical systems.Verifiable Rewards, Evals, and Neural SimulationAlessio [00:26:41]: How do you think about, yeah, touching on the simulation side, I think verifiable reward and reinforcement learning is, like, the hottest thing. What have you done internally to build around that? And like, what gives you What makes you sleep at night? Like, if somebody's like, just web coding something or likeAlessio [00:26:57]: wants to try something new, you have like a good enough system. Because I think the opposite is also true, is like if it's super easy to write anythingAlessio [00:27:04]: then it puts a lot of work on like the verifiableAlessio [00:27:07]: side of it. Like, what does that look like for people?Peter [00:27:10]: Yeah. So verifiability, a broader bucket of like evaluations, right? Like how do you evaluate the results that you're, you're getting? I think this is probably the hardest problem right now, because the As the models get better, it can be harder and harder to find the faults on the system.Peter [00:27:29]: And so like the problem of doing proper eval to find those faults, like that problem also keeps getting harder as the models get better. But it's no less important than it's ever been, right? You still there are still going to be edge cases that are not met and whatnot. And so it's, it's a big area of investment for us. On the reinforcement learning topic, the key thing is there's all these new requirements that come to be in the latest generation of these technologies. So for example, end-to-end is the big thing right now in autonomy and physical AI, which is you can now train these models that can effectively take sensor data in and then put control signals out, and get really good results out of that. But the way that you train and improve those models is really different from the previous generations. And so to do reinforcement learning on an end-to-end model, you now need to actually simulate all the sensor data, right? So then this becomes a we call our, work in this neural simulation, but it'sPeter [00:28:26]: think of it like a hybrid of Gaussian, splatting and diffusion methods, and where you really care about performance. Like performance is everything. If you can't do enough simulation fast enough and cheap enough, you actually can't get results that are worthwhile, in the end. It also gets to a lot of our work in embedded systems, which is like performance critical work, and that performance optimization, performance criticality, it carries over to a lot of the model training work. because, like, the only way to make it affordable is it has to be really fast.Qasar [00:28:58]: I think it's worth a few minutes talking about our own, evolving thoughts on verification and validation withinQasar [00:29:05]: kind of, traditional simulators, which are, you can think of like vehicle dynamics or something like that, which you're just taking textbooks and taking those formulasQasar [00:29:13]: and putting them into software, to like now this neural sim/world model universe. I think that's an interesting topic.Peter [00:29:20]: Yeah. So in more traditional development, right, you oftentimes would have, more black-and-white answers to questions.Peter [00:29:28]: And so the in Europe as an example, there's, a regulatory, system, it's called Euro NCAP. It's the European New Car Assessment Program, and as part of that, the vehicles have to pass a bunch of tests, and those tests actually, include, safety systems. So automatic emergency braking for a child that runs in front of a carPeter [00:29:51]: or let's say an occluded child that runs out and you hit it. And so you have You end up with sort of these binary answers of like, well, did the car under test pass this specific test? And there's a very well-known set of test casesPeter [00:30:05]: that the vehicle has to pass. And that was how the industry worked, let's say, until 10-ish years ago. But what's changed now is with these models, everything is statistics, right? Like you no longer have a black-and-white answer, but it's like, well, how many orders of magnitude or how many nines of reliability can I get in the system, and how can I, how can I prove that to be true? And the big unlock honestly for physical AI as an industry is that these models are just becoming much more reliable. Right? Things like things actually work a lot better. It's like the number of nines you can get out of these systems are now good enough that it actually becomes cost effective to really deploy these things. And so the big shift in, so verification and validation has been from a little bit more of a Again the past it was strictly requirements, and are you meeting or not? And now it's more of a statistical, verification and validation case where it's all about how many nines of reliability and meantime between failures, that sort of thing.Statistical Validation, Regulators, and the Cruise LessonSwyx [00:31:04]: And is the target audience regulators or even the customers are yeah, if you I imagine the customers are bought in, and it's mostly regulators that need to be satisfied.Peter [00:31:15]: We do work with the US government, we do work of course with the European governments and the government of Japan, and the government is not like an AI lab by any means.Peter [00:31:25]: So Swyx [00:31:26]: They just care about the outcome.Peter [00:31:27]: They care about the outcome.Peter [00:31:28]: And so we do education, in that regard, and like so sort of teaching about, “Hey, this is how we think validation should be done, and this is an approach that we think is reasonable,” and how to think about like when is a driverless system actually safe enough to go on the roads and that sort of thing. But I wouldn't say that the government is asking for it. It's like we're more teaching the government in that, in that sense. It's honestly, it's more so for our own, our own comfort, right? Like, we want to build very safe systems, and then of course our customers care deeply about that as well. But in that context we're also typically educating our customers.Qasar [00:32:01]: Yeah. Our first, our first core value is on round safety. So I think we can't underline enough that, us also verifying and validating that the systems that we're deploying are safe to us is probably as important as, like, some regulator or a customer saying,Swyx [00:32:19]: Of course. Okay. Yeah.Swyx [00:32:20]: You have to satisfy yourselves.Peter [00:32:22]: As I say, as a whole across the world, regulation oftentimes it's like a almost lowest common denominator. But like, you really have to substantially exceed what the regulators are expecting to make good products.Swyx [00:32:33]: Yeah. One thing I often talk about, I think and I try to make this relatable to the audience also, is Cruise, where they had an accident that basically ended the company. I wonder if people overreact to single incidents, because incidents are going to happen regardless, right? ‘Cause it's a statistical thing, but as long I don't know if regulators understand that, you cannot extrapolate from a single incident, but we do because that's all we have to go on. And your sample sizes are necessarily gonna be lower than, I don't knowSwyx [00:33:00]: consumer driving.Qasar [00:33:01]: Yeah. I think the Cruise example wasn't a technology failure. there was The real, compounding issue there was just how did the company talk to the regulators and what was their kind of behavior, and I think that became more of the issue. If you look,Peter [00:33:19]: It isn't It definitely was a technology failure, but it was made much worse by theSwyx [00:33:23]: Put the car back on the woman.Qasar [00:33:25]: Yeah. And let me put it another way. There is a version where Cruise still exists.Swyx [00:33:29]: right. Right.Qasar [00:33:30]: Right. It'sSwyx [00:33:30]: It was like the last strawQasar [00:33:31]: ItSwyx [00:33:31]: in like a long chain ofSwyx [00:33:33]: like issues.Qasar [00:33:33]: So do you feel like ATG had that horrific accident or someone actually dying, because, that was a homeless person crossing the street? So yeah, I think we can't understate enough that ultimately, like, statistical validation of something, that's one part of it, but it's not the only part of it. Like, consumer and let's say, mainstream adoption of these technologies is also gonna be part of that conversation. I think companies like Waymo are doing a lot of service positively to the industry in the sense of they're, they're setting a high benchmark and they're showing, kind of in a very responsible way how to, how to deal with these. There have been Waymo incidences as well. They've just not been as significant as the Cruise one that you mentioned. But yeah, so I think you'll just continue to see that. I think probably the long term question is really gonna be, again, around Like it is very clear humans are way worse drivers statistically.Qasar [00:34:29]: Like, there's no, there's no debate. And so at what point But we're emotional animals.Swyx [00:34:34]: Yeah. So my thing is, like, we have to get to a point as a society where we accept horrific accidents that would never happen by a human because statistically we understand that it is safer overall. In the same way that planes, they're safer, than I think they're the safest mode of transport that we have.Qasar [00:34:50]: Yeah. it's more dangerous to drive to the airport than it is to get on a flight.Qasar [00:34:53]: So if you're everQasar [00:34:54]: if you're ever getting nervous about getting on a plane, just think “I just gotta get to the airport.”Swyx [00:34:58]: Yes, we're flying.Qasar [00:34:59]: If I get to the airportQasar [00:35:00]: I'll be good.Swyx [00:35:00]: But then it's, planes also concentrate the tail risk if planesQasar [00:35:03]: Yeah. AndPeter [00:35:04]: And I was, I don't think we honestly have to worry about there ever being, accidents from these systems that are like much worse than what humans would cause, ‘cause humans do terrible things.Peter [00:35:14]: Like, people fall asleep at the wheel all the time.Swyx [00:35:16]: I have.Swyx [00:35:17]: Like, I'll call, I've been a drowsy driver.Peter [00:35:19]: Kinda drunk drivers, and that'sPeter [00:35:20]: that's the extreme end of the example. But these AI systems, you have redundancies, you have fallbacks. Like, there's many things have to go wrong for there to actually be a something catastrophic because there's, there's so many, fallbacks that these systems have.Alessio [00:35:36]: your simulation is like so vast because there's so many use cases. What are, like, maybe things that worked in a simulation and then you put it out and it's like, “F**k, this isAlessio [00:35:45]: this just did not work at all?”Peter [00:35:47]: Yes.Alessio [00:35:47]: IsPeter [00:35:47]: That's maybe a bit of a misconception, about simulation there. So let me go a little bit, more technical on this. So at first go, no simulation is going to represent the real world. There's always a process of this, sim to real matchingPeter [00:36:02]: where you actually, you need the real world feedback to basically feed into the parameters that are being used in the simulator, and you have to do that, it's like this validation flow, a number of times until you can get some confidence that, like I think the simulator is now accurately representingPeter [00:36:19]: what's gonna happen in the real world. Now, if you have a situation where you've done that full validation and you thought that it was accurate and then there's something different, those are much trickier cases, and that's, that absolutely can happen, but really I think the validation process is a really important part. You can never skip the simulation validation process, like where you're actually ensuring that, hey, the actual, my sim to real gap here is small enough that I can trust these simulation results. And there's, there's so many fun things that you can do when you get into it. Like, I'll, I'll give one fun example that came up recently is like in these humanoid robotics, systemsOverheating actuators is a real problem, right? So obviously phenomenal demos. IPeter [00:37:01]: The most amazingAlessio [00:37:02]: For 10 minutes.Peter [00:37:03]: The most amazing I can get. I love, I love watching robots do acrobatics like everybody but the these systems actually overheat, right? If, like, And one of the ways you can use simulation though is you can actually have that, the temperature of those actuators be one of the parameters that's representedPeter [00:37:18]: in the simulation. And if you're doing reinforcement learning over a certain task, then the robot can actually adjust its motions in the simulation to account for the fact that, oh, it knows that as it's moving, it's actually beginning to overheat this motor. But if you didn't have that parameter of, let's say, the heat of that motor represented in the simulation initially, then your RL policy might It will disregard that. And now you run that on the robot and the robot will overheat and fail.Alessio [00:37:43]: I guess the question is, like, how do you have all of these parameters taken care of while also understanding the deployment environment? Like, temperature is like a great example, right? WellAlessio [00:37:53]: why did you make my robot worse when it runs in like a freezer?Alessio [00:37:57]: So it actually shouldn't worry about that. it's like, yeah, how do you design these simulations?Peter [00:38:02]: This is honestly the This is what makes simulation so hard, right? it's because you Simulation is fundamentally about you're trying to optimize the development of a system, right? Like, how can I build this system faster and better and cheaper and what are all the levers that I have to actually accomplish that? And because simulation's just a software program, you can, you can change it a lot more easily than you can hardware systems. And then what's particularly awesome about the let's say, world models and using that as a part of simulation is now the simulation doesn't just scale with, let's say, adding new math equations inPeter [00:38:36]: but we can actually scale the simulation environment now with additional real world data and that also unlocks a whole new field of robotics.Qasar [00:38:46]: There is a meniscus line where you cross where still doing real world testing is better. there's, in this, sim-to-real gap, you can reproduce reality at exceedingly expensive costs and this So nothing is free. So really you have to you're finding that line where you're getting great performance, you're getting great feedback, whether it's on the training side or on the eval side, but it's way cheaper than doing it in the real world. At some point it, that doesn't make sense. And so even, from our earliest days in autonomy, our view was you're still gonna do real world testing. You There's, there's not, there's not this, magical land where you're not gonna do that. And maybe even like a more nuanced version of this in like traditional software development is, most of your testing for software in a vehicle, 95% of that can be like traditional CI/CD kind of, flows that you would have in traditional web development. But once you have Now you, let's say you have a truck. Well, you can do like 4% of those in like a rig which has all the components, the electrical and electronics of a truck, but doesn't have, it doesn't have the tires and it doesn't have the And then you have the 1%, which is actually the vehicle. There's something There's a similar analogy in terms of using simulation for intelligent systems. You can do a lot in a simulator, but in using world models, but ultimately it's, it's physical AI. So you're gonna deploy it on physical machines andQasar [00:40:17]: the freezer example comes to, comes to light.Alessio [00:40:20]: The world model thing has been to me the hardest thing toAlessio [00:40:22]: wrap my head around. Like we have Faith Eliyon on the podcast.World Models, Hydroplaning, and Cause-Effect LearningQasar [00:40:25]: We've been doing a small series with like another Intuition company, General Intuition as well.Qasar [00:40:31]: yeah, and I mean, lots of, lots of coverage on NeRFs and yes.Alessio [00:40:34]: Yeah. It feels like we talk with about, the heliocentric system, right? It's like in a world model, if you just feed visual data, the model might learn that the sun spins around the Earth. It makes sense, right? And it's like, well, not really. And I think what are like some of these other things that like hydroplaning is one thing I think about, is like can a world model understand hydroplaning and like what amount of water like causes it to happen? And it's like, yeah, to me it's like I don't understand how you guys do it. I guess it's like the real thing is like when you're doing both cars and the highway in Japan versus the excavator in a mine in,Qasar [00:41:13]: ArizonaAlessio [00:41:13]: wherever you're Arizona, wherever you're deploying them.Alessio [00:41:15]: How much of it are you relying on the world models to like generate the simulations for you and then try and close the gap after versus like giving the world models as a tool to your engineers to like curate the simulations if that makes sense?Peter [00:41:28]: Yeah, totally. So yeah, I can say at a pure engineering level, I think if you're hoping to do real world deploys and you're purely relying on a world model approach, you probably won't get to something that works, before you go bankrupt. So there is just a very practical mindset of like, world models are amazing and they're extremely useful for a lot of use cases, but there are a lot of other things that you need to do to actually get something started and something deployed and working. most fundamentally, world models are all about It's understanding the world, but also understanding what's going to happen. It's like the cause-effect relationship.Peter [00:42:01]: Right? And so like it, right, if you have a take some sort of construction tool, and that construction tool is gonna be doing some work on the Earth in some way, it's gonna be moving earth, the world model needs to understand that cause-effect relationship. Like, okay, when I, when I take this material from here and put it over there and now I have things that are over here and not over there anymore and that cause-effect, relationship. data obviously is a is a big problem. The hydroplaningPeter [00:42:26]: one is actually a really great example because it's actually quite non-obvious sometimes. Right? It's like, well, it's, it's raining and well this road, has, let's say the appropriate curvature to it so the water is running off the road and cars are driving faster here and then you approach a road that's very flat and water is now puddling on that road and all of a sudden cars are driving slower because when they were driving faster they were starting to lose control. And there are a lot of visual nuance, very nuanced visual cues in the scene and so I do think in the world model concept there's a good chance that the model actually would learn that you should just drive slower when these visual cues exist, and that's obviously the beautiful-The beauty of, these kinds of models where they just, they learn these non-obvious things.Swyx [00:43:14]: It doesn't need to know about hydroplaning to know that it needs to drive slower.Peter [00:43:17]: Yes.Swyx [00:43:17]: I guess it's Yeah. I wanna ask questions about, also deploying models. I presume, like, you use a lot of these world models for training data and simulation, but what about deploying it onto the systems in production? Presumably you have you have, like, GPUs on deviceOnboard vs. Offboard: Latency, Embedded ML, and DistillationSwyx [00:43:36]: but they're I keep saying on device. What's the what's the right term for that?Peter [00:43:40]: On machine.Swyx [00:43:41]: On machine.Peter [00:43:41]: Or embedded, yeah.Swyx [00:43:42]: Yeah. What is the embedded world like? because for people who are not used to that world, this is very alien.Peter [00:43:49]: Yeah. So it's actually We call it onboard and off board.Peter [00:43:52]: So like, onboard software and off board software.Peter [00:43:54]: And the great thing about off board software is you don't have to care about time, and you can run really large models, right? So you can, you can say, “Well, this model, I don't care if it takes one second for it to give me a result or 10 seconds for it to give me a result, because we have time.” And the models can be really big, and they can run, in a data center or on a on a huge GPU and you can obviously have distribute to compute, et cetera. But onboard you don't have any of those benefits. You're like, “Well, I need I have this many milliseconds where I need an answer from this model.” And so a lot more of the energy then is about, think of it more like distillation and it's like truly efficiency and like, literally every fraction of a millisecond counts. And you can't have a situation where the model takes too long because then the vehicle can't actually function.Peter [00:44:42]: And so you can, you can still use a lot of the same techniques, and the models themselves you can think of as like a derivative of larger models that you can run offline, and then you're, you're trying to just get a model that is still performs really well but it's, it's a it's smaller, small enough version that you can then run on this embedded system where you care about latency and power.Qasar [00:45:03]: Yeah. And I think like, the broader point I think which, maybe is not obvious but it's worth saying is in physical AI world, we're not really constrained right now by, like, the intelligence of the models. It's actually what Peter's talking about, it's actually deploying them inSwyx [00:45:19]: The hardware they give you.Qasar [00:45:21]: Yeah. On the hardware you give you.Qasar [00:45:22]: And so And there's just a reality is of safety critical systems. So those end up being the your limiting factorsQasar [00:45:29]: rather than, let's say, a limiting factor for, a foundation model companyQasar [00:45:34]: is gonna be just capital maybe or researchers.Qasar [00:45:38]: So we're, we're in that way dealing with, for us as people who kind of come in that realm with like a very interesting Those constraints force creativity.Swyx [00:45:47]: And I imagine, nobody was deploying or giving you the hardware for transformers back in 2018, whatever, but now they are. What's the evolution like? just peel back the curtains a little bit.Peter [00:45:59]: Yeah. Transformers first off, I think the paper was originally published in 2017.Swyx [00:46:02]: 2017.Swyx [00:46:02]: So there's no time.Peter [00:46:04]: And ISwyx [00:46:05]: But I'm just saying I guess I'm saying, like, embedded ML systems usually, like, a lot less parameters, a lot less compute, and now, like, orders of magnitude more.Peter [00:46:14]: Yeah. absolutely. what I was gonna say though was I think in the in the original paper in 2017, maybe it's in the last paragraph, somewhere in the paper they talk about, like, “Oh, by the way, this technique might be useful for, like, images and videos as well.”Peter [00:46:30]: These last subjects.Peter [00:46:31]: And it took a few years for that impact to really hit. But like, now, we're seeing transformers are everywhere.Swyx [00:46:39]: Yeah. Vision transformers.Peter [00:46:40]: And then then the compute just keeps getting better and better. But you do have this fundamental trade-off, right? It's like you have power, you have cost, and performance and like, getting the right, getting the right mix of those things in an embedded package that can also be, like, shaken and baked in all thePeter [00:47:00]: conditions that these things have to have to operate in. But yeah, I think that they're only going to keep getting better and so we also try to plan our strategy understanding that, we know the rate of improvements of these systems.Swyx [00:47:11]: Yeah. So like, Google just released the Gemma 2B modelSwyx [00:47:15]: that effective 2B model. Is that useful to you guys or is that too big?Peter [00:47:18]: You can run that model on an embedded system, definitely.Peter [00:47:21]: the So yes, it's, it's useful in that regard. The bigger question is, like, what do you use it for in an embedded system? Like, you actually need to customize it quite a bit to make it useful for something. But yeah, you could run a two billion parameter model, definitely.Swyx [00:47:35]: It also interesting, like, what percent is a custom ML model that only does that thing versus a generalist LLMSwyx [00:47:41]: which probably is not that useful actually for your context.Peter [00:47:46]: Like, you, like, you can imagine different use cases, right?Peter [00:47:48]: So theSwyx [00:47:49]: The voice stuff, yes.Peter [00:47:49]: Yeah, the voice test. Totally, yes.Peter [00:47:51]: So for the actual, autonomy elements, that's 100% in-house. We do every bit of that, the data simulation, the model, everything. But when you get into the more generic use cases like voice or voice assistant kind of thing, that's where these more generalist models like Gemma actually can be quite, can be quite useful.Swyx [00:48:09]: Yeah. And then there's also obviously a trade-off between, like, what percent must you do on machine, versus just call home.Peter [00:48:16]: Yeah. It's all about latency.Swyx [00:48:17]: Latency.Peter [00:48:17]: It's all about latency. Yeah.Swyx [00:48:18]: Yeah. Well, like, I think actually in a lot of contexts, especially in the US, you can just have a connection to the web.Qasar [00:48:26]: Yeah. I think though most of our universe is everything has to be fairly, embedded and local because just the nature of Even in the US there's a lot of likeSwyx [00:48:39]: PatchinessQasar [00:48:40]: don't haveQasar [00:48:41]: have coverage, right? And if you look at, like, the old world of autonomy within mining, which is, like, long before transformers and kind of, neural networks, in the like CNN and kind of a universe, they were really just hand-coded, systems. They were just like, this machine is gonna run to that place with thisPeter [00:49:03]: That was our GPS, like very accurate GPS.Qasar [00:49:05]: Yeah. And so that worked, and that worked for 20 years, so why would we actually need to use transformers or kind of more modern end-to-end systems? Mainly because you can only really run a path and run backwards. That provided a lot of value, but m-Not as much as you get when the machine is actually intelligent. It's, it's seeing, it's perceiving, it's acting in a dynamic world.Alessio [00:49:28]: I looked up RTK, real-time kinematic, one to two-centimeter accuracy.Qasar [00:49:32]: Yeah. Fantastic. But the and fantastic in faraway lands where there's not gonna be cell phone coverage.Peter [00:49:39]: Yeah, so it's widely used on the legacy mining and agricultural autonomy systems today. So like, for example, a combine that can be precise within one or two centimeters as it's driving down the field, they use RTK.Qasar [00:49:53]: Yes.Peter [00:49:53]: But it's, it's expensive.Qasar [00:49:54]: Yeah. And it's, it's, it's autonomy, but it's not intelligent in the way that I think all of usQasar [00:49:58]: if in twenty-six we'd be talking about intelligence.Alessio [00:50:00]: In one of your blog posts, you mentioned research on large scale transformers that are similar to those doing modern generative AI. What are, like, the big differences other than, “You're absolutely right. I should steer the car, so you probably wanna remove that?”Peter [00:50:14]: We have a diversified bet strategy internally, and the reason we've done that is because we operate in now a bunch of industries, a bunch of geographies, and each of the approaches has, obviously a different risk to them.Peter [00:50:27]: And so like, we're not going to put all of our eggs in a single basket for a single approach because that approach may no

Sharks Hockey Digest
Cuda Closed + Offseason Begins

Sharks Hockey Digest

Play Episode Listen Later Apr 25, 2026 30:00


On the latest episode of Morning Tide, Ted goes over The Barracuda's season coming to a close against Henderson.

Teal Town USA
Game 1: Barracuda vs Silver Knights - 4/22/2026 - Barracuda After Dark on Teal Town USA

Teal Town USA

Play Episode Listen Later Apr 23, 2026 28:06


The San Jose Barracuda began their Calder Cup journey and blew a 3-1 lead, losing 5-4 in overtime to the Henderson Silver Knights. Puckguy & Tyler break down a frustrating loss which suddenly puts the Cuda on the brink of elimination and chat about the Stanley Cup Playoffs as well. Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists!
 Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!


Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Shopify's AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 22, 2026 72:25


Early bird discounts for the San Francisco World's Fair, the biggest AIE gathering of the year, end today - prices will go up by ~$500 tonight so do please lock in ASAP!From near-universal AI tool adoption inside Shopify to internal systems for ML experimentation, auto-research, customer simulation, and ultra-low-latency search, Mikhail Parakhin joins us for a deep dive into what it actually looks like when a 20-year-old, $200B software company goes all-in on AI. We cover why Shopify has become much more vocal about its internal stack, what changed after the December model-quality inflection, and why the real bottleneck in AI coding is no longer generation, but review, CI/CD, and deployment stability.We also go inside Tangle, Tangent, SimGym, which are three major AI initiatives that Shopify is doing to make experimentation reproducible, optimization automatic, customer behavior simulatable, and search and catalog intelligence faster and cheaper at scale. Along the way, Mikhail explains UCP, Liquid AI, and why token budgets are directionally right but often measured badly, why AI-written code can still increase bugs in production, what makes Shopify's customer simulation defensible, and what he learned from the Sydney era at Bing.We discuss:* Mikhail's path from running a major Microsoft business unit spanning Windows, Edge, Bing, and ads to becoming CTO of Shopify* Why Shopify is talking more publicly about AI now, and why staying at the frontier has become necessary for the company* Shopify's internal AI adoption curve, the December inflection, and why CLI-style tools are rising faster than traditional IDE-based tools* Why Jensen Huang is directionally right on token budgets, but raw token count is still the wrong way to evaluate engineering output* Why the real unlock is not more agents in parallel, but better critique loops, stronger models, and spending more on review than generation* Why AI coding can still lead to more bugs in production even if models write cleaner code on average than humans* Why Shopify built its own PR review flow, and why Mikhail thinks most off-the-shelf review tools miss the point* How PR volume, test failures, and deployment rollback are becoming the real bottlenecks in the agent era* Why Git, pull requests, and CI/CD may need a new metaphor once code is written at machine speed* What Tangle is, and how Shopify uses it to make ML and data workflows reproducible, collaborative, and production-ready from the start* Why Tangle is different from Airflow, and why content-addressed caching creates network effects across teams* What Tangent is, and how Shopify is using auto-research loops to optimize search, themes, prompt compression, storage, and more* Why Tangent is becoming a democratizing tool for PMs and domain experts, not just ML engineers* Why AutoML finally feels real in the LLM era, and where auto-research still falls short today* Why Tangle, Tangent, and SimGym become much more powerful when combined into one system* What SimGym is, why simulated customers only work if you have real historical behavior, and why Shopify's data gives it a moat* How SimGym evolved from comparing A/B variants to telling merchants what to change on a single live storefront to raise conversions* Why customer simulation is so expensive, from multimodal models to browser farms to serving and distillation costs* How Shopify models merchant and buyer trajectories, runs counterfactuals, and thinks about interventions like discounts, campaigns, and notifications* Why category-level behavior is so different across commerce, and why ideas like Chinese Restaurant Processes are showing up again in practice* Shopify's new UCP and catalog work, including runtime product search, bulk lookups, and identity linking* Why Shopify is using Liquid AI, and why Mikhail sees it as the first genuinely competitive non-transformer architecture he has used in practice* Where Liquid already works inside Shopify today, from low-latency query understanding to large-scale catalog and Sidekick Pulse workloads* Whether Liquid could become frontier-scale with enough compute, and why Shopify remains pragmatic and merit-based about model choice* Who Shopify is hiring right now across ML, data science, and distributed databases* The Sydney story at Bing, why its personality was not an accident, and what Mikhail learned from deliberately shaping AI character early onMikhail Parakhin* LinkedIn: https://www.linkedin.com/in/mikhail-parakhin/* X: https://x.com/MParakhinTimestamps00:00:00 Introduction: Mikhail Parakhin, Microsoft, and Shopify00:01:16 Why Shopify Is Talking More About AI00:02:29 Internal AI Adoption at Shopify and the December Inflection00:06:54 Token Budgets, Jensen Huang, and Why Usage Metrics Can Mislead00:10:55 Why Shopify Built Its Own AI PR Review System00:12:38 AI Coding, More Bugs, and the Real Deployment Bottleneck00:14:11 Why Git, PRs, and CI/CD May Need to Change for Agents00:18:24 Tangle: Shopify's Reproducible ML and Data Workflow Engine00:21:19 Why Tangle Is Different from Airflow00:26:14 Tangent: Auto Research for Optimization and Experimentation00:30:07 How Tangent Democratizes Experimentation Beyond ML Engineers00:33:06 The Limits of Auto Research00:36:36 Why Tangle, Tangent, and SimGym Compound Together00:37:20 SimGym: Simulating Customers with Shopify's Historical Data00:42:47 The Infra Behind SimGym00:46:00 Why SimGym Gets Better with Real Customer History00:47:30 Counterfactuals, HSTU, and Modeling Merchant Trajectories00:51:55 CRPs, Clustering, and Category-Level Customer Behavior00:53:30 UCP, Shopify Catalog, and Identity Linking00:55:07 Liquid AI: Why Shopify Uses Non-Transformer Models00:59:13 Real Shopify Use Cases for Liquid01:03:00 Can Liquid Scale into a Frontier Model?01:09:49 Hiring at Shopify: ML, Data Science, and Databases01:10:43 Sydney at Bing: Personality Shaping and AI Character01:13:32 Closing ThoughtsTranscript[00:00:00] swyx: Okay. We're here in the studio, a remote studio, with Mikhail Parakhin, CTO of Shopify. Welcome.[00:00:08] Mikhail Parakhin: Thank you. Welcome.[00:00:10] swyx: I don't even know if I should introduce you as CTO of Shopify. I feel like you have many identities. Uh, you led sort of the, the Bing ML team, I guess, uh, uh, or ads team. I, I don't know, I don't know, uh, you know, it's, uh, people va-variously refer you as like CEO or, or, uh, I don't know what that, that, that said previous role at Microsoft was.[00:00:29] Mikhail Parakhin: Uh, that was... Yeah, my previous role w- at Microsoft was the-- I actually was the CEO of one of Microsoft's business units, which included, as I, you know, as we discussed, all the things that people like to laugh about, uh, including Windows and Edge and Bing and ads and everything.[00:00:47] swyx: Yeah, yeah. What a, what a, what a wild time.You've obviously, uh, done a lot since you landed at Shopify. Uh, one of the reasons I reached out was because you started promoting more sort of internal tooling, uh, primarily Tangle, but also a lot of people have seen and adopted Tobi's QMD, uh, and obviously, I think, uh, Shopify has always been sort of leading in terms of, uh, engineering.I think more-- it's just more recent that you guys have been more vocal about your sort of AI adoption. Is that, is that true?[00:01:16] Mikhail Parakhin: Well, I think AI tools in general are fairly recent development, uh, and we've-- Shopify, you know, at this stage of its development, we're developing AI in-in-house and other, uh, building tools that use AI and, you know, interfacing with the wider AI community, uh, you know, are on the sort of the, uh, runaway trajectory.So it just did by sort of natural byproduct. We, we talk about it more also. We just, uh, just even yesterday, Andrej Karpathy was famous in tweeting about, oh, are there some, uh, ways, uh, that, that you can organize your agents to store the data and then, uh, look up the data so that you don't have to research or, or lose context every- Yestime. And a little bit tongue in cheek, I tweeted that, “Hey, we've, we've done it much earlier, and we even have different approaches, Tobi and I.” Tobi, of course, is a big fan of QMD, and I'm more of a SQL, SQLite fan. But, uh, yeah, very similar things that we've already done here. The point is, yeah, we're very dynamic, you know, explosively growing company, and we have to be at the forefront of AI adoption, obviously.[00:02:29] swyx: Yeah. Yeah. Um, you, your team kindly prepared some slides actually that we were gonna bring up on to, uh, the screen. I think I can, I can screen share, and then we can kind of go through some of the shocking stats that maybe, maybe put some numbers to what exactly is going on. So here we have, uh- An internal AI tool adoption chart.What are we looking at here? What ?[00:02:54] Mikhail Parakhin: Yeah, this is very interesting statistics. Uh, this is number of daily active workers, you know, think of, uh, DAO, basically the active users of-[00:03:05] swyx: Yeah ...[00:03:05] Mikhail Parakhin: AI tool as a percentage of all the people in the company, right? And then- Yeah ... different AI tools. And, uh, you could see two things here is that one is the green is total.Uh, green is just total. So you could see that it approaches really % by now. It's hard not to do your job now without interacting deeply, at least with one tool. You could see another interesting thing is just as many people commented in December was the phase transition when suddenly models gotten good enough that, that everything took off and started growing.Uh, it, it was many people noticed that the thing is that small improvements accumulated into this big change in Sep- December roughly timeframe.[00:03:52] swyx: Yeah.[00:03:52] Mikhail Parakhin: The other thing I would claim you could see is that, uh, CLI-based tools and tools that don't require you to look at the code becoming more popular, and you could see, yeah, various versions of, uh, Cloud Code and Codex and Pi and internal development tools taking off.Uh, exactly, yeah, uh, and blue is our River, just internal agent for coding, where tools, uh, that require IDEs such as, uh, GitHub, Copilot or Cursor, they're not exactly shrinking, but they're not growing as fast. Like, uh, red, red line is, is the IDE kind of tools. So you could see that they're, they're not experiencing as, as fast of a growth.[00:04:37] swyx: As I understand it, basically, every employee has their choice, right? Of choose whatever tool you use, and then you're just kind of doing a, a daily sur-survey or something.[00:04:47] Mikhail Parakhin: Exactly. And, uh, we- Yeah ... the, the push is to get your job done, you can use any tool, and we effectively fund unlimited tokens for everybody.Uh, we, we do, we do try to control the models that, uh, people use, but from the bottom, not from top. Like we basically say, “Hey, please don't use anything less than Opus four point six.”[00:05:09] swyx: Oh .[00:05:10] Mikhail Parakhin: Some people, some people end up using GPT five point four extra high. Some people use Opus four point six. Um, uh, you know, uh, there are some, uh, there are plus and minuses in going for full one million context window versus not.But, uh, we try to discourage people from using anything less than that.[00:05:28] swyx: Yeah, yeah. Got it, got it. Uh, I mean, uh, that's, you know... The, the next chart here, it really kind of shows the expansion and the sort of December twenty twenty-five inflection, right? That, uh, people are using a lot of tokens. I think it's also really interesting that no one was kind of abusing it in twenty twenty-five.Like it was- Had comparatively, uh, to this year, there was almost no growth. I mean, it's still like, you know, probably, probably gave fifty percent.[00:05:56] Mikhail Parakhin: Yeah. This is just a different scale. It's still exponential- Yeah, yeah ...growth at just a different- ...rate of expansion. Uh, there was inflection point, and Sean, I would claim the, the super interesting part here is that you could see that the distribution becoming more and more skewed.Yes. The top percentiles grow faster. So that means- Yeah ...the people in the top ten percentile, they, their consumption grows faster than seventy-five and so forth. So, uh, the distribution skews more and more towards the highest users, which is... I don't know what it tells me. It's like it feels not ideal, to be honest.Or maybe it's okay. We'll see.[00:06:36] swyx: Why does it feel not ideal? Is, is it because of, um, quantity over quality, or what's the concern?[00:06:42] Mikhail Parakhin: Because take it to the limit. That means, you know, if, if this rate of separation continued- Ah, yes ...a year, there will be one person consuming all the tokens. So it's just, it's kinda strange.[00:06:54] swyx: Yeah, I mean, um, uh, I, I think internal like teaching and all that, uh, will, will help sort of distribute things more widely. But in, in the early days, of course, the people who are sort of more AI-pilled will obviously find more ways to use it than the people who are less AI-pilled. Maybe let's, let's call it that.I'll just, I'll just kinda quickly, uh, pause from the, the... You know, we will go back to the rest of the slides, but I just wanna, um, review, you know, there are a lot of CTOs of, of large companies like yourself where they're all considering some kind of token budget, right? Like I think it's something, something that Jensen Huang has been talking about, where like if your 200K engineer is not using 100K of tokens every year, like they're, they're underutilizing coding agents.Of course, Jensen Huang would say that, but like it seems a very quantity over quality approach and like some, some people are basically saying like, well, is this comparable to judging engineer quality by lines of code, right? Which we also know is like kind of flawed, but better than nothing. So I, I don't know if you have like a sort of management take here on, on how to view this kind of, uh, metrics.[00:08:02] Mikhail Parakhin: Well, I mean, you're, you're baiting me. I, I like... This is my favorite topic. Uh, if you let me, I'll probably talk for two hours on just this. I have a lot of things to say. Like I do think Jensen gotten a lot of bad press saying, “Oh, of course you're, you know, this, uh, the- ...the cake seller says you don't need enough cakes.”You know? Like, of course. Uh, but, uh, I actually, uh, think that's undeserved. I think he, he's actually right. Uh, I do think- He,[00:08:33] swyx: he's directionally correct.[00:08:35] Mikhail Parakhin: Yeah. Yeah. He's directionally correct for sure. Uh-[00:08:37] swyx: Who knows what the right number is? Yeah.[00:08:39] Mikhail Parakhin: The thing that I do Uh, want to say, and this is something that we learned through trial and error and very important is like two things.One is that it's not about just consuming tokens. Uh, you can consume tokens and, and in fact, the anti-pattern is running multiple agents, too many agents in parallel that don't communicate with each other. That's almost useless, uh, compared to just fewer agents and burns tokens very efficiently. Uh, setting up the right critique loop, especially with the high quality models, where one agent does something, the other one, ideally with a different model, critiques it, uh, suggests ways to improve it, the agent redoes it with this critique and, and so it takes much longer.So people don't like it because latency goes up. You know, they, they have to wait until this debate is happening. But, uh, the quality of the code is much higher. And another thing, just since you mentioned like, look, uh, uh, yeah, the overall budget is just like, uh, lines of codes. Lines of codes are exploding for everybody right now, or partially because AI is really mover balls, but partially just because AI can write a lot more code, you know, doesn't get tired.And so you have to have to have a very strong narrow waist during PR review. Otherwise, just the number of bugs will go through the roof. It's, uh, it's this unexpected consequence of the just volume trumping everything. I would claim by now good model writes code on average with fewer bugs than, than the average human.But since they write so much more of it, like more of it will make it into production. So you have to- You still[00:10:26] swyx: have[00:10:26] Mikhail Parakhin: more bugs. Yeah. Have to have a very rigorous PR reviews, also automated of course. But, uh, yeah, that to spend a lot budget there. Like this, this for me, for me, actually, the important metric is the ratio of budget spent during code generation versus, uh, spent, uh, expensive tokens like GPT, uh, five point four Pro or, uh, uh, Deep Think from Gemini, you know, checking on PR reviews.[00:10:55] swyx: Yeah, totally. Uh, I noticed in your chart you didn't have any review tools. Do you just use like, like let's say a Claude code to review tools? Or do you have another set of review tools like the Greptiles, the Code Rabbits, uh, Devin Reviews has a review tool. I don't know if you've had those specialist review tools.[00:11:13] Mikhail Parakhin: You are a little bit jumping on my store tool right now because the graphs I was only showing public tools. Uh, uh, the-- I haven't found a good PR review tool that, that does what I think should be done. And, uh, partially my, my thinking is because it's so... It just goes against both what people feel like emotionally they prefer and, uh, some of the, uh, you know, frankly Even business models that, that the companies run.At peer review tool, uh, time, you want to run the largest models. That means, I don't know, Codex or, or, uh, Cloud Code is not gonna cut it. You need to have pro-level models if you really want to, uh, stand the tide of bots from going into production. And you need us to spend a lot of time, the models taking turns, but you don't want, like, a big swarm of, uh, of, uh, agents.So in fact, you end up in a different dual-dualistic world where you generate not that many tokens. You, in fact, generate few tokens, but it takes f-a long time because these are expensive models taking turns rather than many, many agents trying to do many things in parallel. So that's, that's why I feel like I haven't found good tools, so we are using our own for peer review for now.[00:12:33] swyx: Yeah. Yeah. I mean, uh, I think a lot of companies are building their own, uh, especially to their needs, right?[00:12:38] Mikhail Parakhin: Mm-hmm.[00:12:38] swyx: Um, I, uh, you also have a chart here going back to the slides on, uh, PR merge growth, where we're now at thirty percent, uh, month on month rather than ten percent. Uh, and also the, the estimated complexity is going up.You know, this is productivity, right? ‘Cause y- presumably there's more stuff going into the code base and more, more features getting worked on. I'm curious about the backlog, right? Like the, the, the-- I actually don't mind a pro-level model taking an hour or two hours to review my PR, because I've dealt with humans who take a week to review my PR, right?And I keep pinging them on Slack, “Hey, hey, review my PR.” So, you know, I think there's some trade-off here where, like, it still doesn't make sense.[00:13:18] Mikhail Parakhin: Exactly. That, that's exactly m-my point. Uh, that on one hand, you can tolerate longer latencies at, uh, PR. On the other hand, like right now, the real problem is not in spending time waiting for PR.It's real problem is since there's so much more code than- Yeah ... uh, probability of at least some tests failing going up, and then you, like, keep de-failing, then you have to find the offending PR, evict it, retest it without that PR, and so deployment cycle becomes much longer. Uh, so it actually, in terms of the overall time to deploy, it's total time savings if you spend more time on a longer model, like thinking for an hour, because then, then you, you don't have to spend all that time during testing and rolling, you know, rolling back the deployment.[00:14:03] swyx: Yeah, totally. That's still worth it. You know, you don't look at the individual, look at the aggregate, and look at the, the, the change in the aggregate system.[00:14:11] Mikhail Parakhin: Exactly.[00:14:11] swyx: I'm kind of curious if, like, there's this PR mentality and, like, c-- the, the, the CICD paradigm will be changed eventually. Some people are like, obviously a lot of people want new GitHub, but I even wonder if, like, Git is the problem, right?Like, is that the bottleneck? Is the concept of a PR a bottleneck? Do you guys use stack diffs? I don't know if, uh, that's a, like, a merge queue stack diff type of thing.[00:14:34] Mikhail Parakhin: We, we use, we use Stacks, we u- we use Graphite. We worked with, uh, Graphite a lot. Uh, so we use Stack, uh, PRs. I think, uh, like that's clearly the overall CICD in general, and the interaction with the code repository right now is the, clearly the sort of the, the main issue and the bottleneck for us, uh, and highest top of mind.I would say we probably need a different metaphor or different whole design of how to process it in new agentic world. I haven't seen anything dramatically better yet. I, I think everybody right now is just trying to keep their head above the water ‘cause, ‘cause there, there's so many PRs and then everybody's CICD pipelines start creaking, the, the times are increasing, the number of bugs slipping by increasing, and you have to, have to clap on down.And so we are a little bit in this situation when we need to first stabilize that story and then start thinking, hey, what, what it could be a completely different and new world, which I haven't... I know some people working on it. I haven't seen something, like anything super compelling yet, but clearly the old thing were designed for humans will need to be morphed into something new.[00:15:53] swyx: One of the thing that I, I think about is kind of like the merge conflict is basically a global mutex on the whole system, right? And in, in hu- in human organizations, we do have something like that. It's the company standup. But like, other than that, it's like it's actually fitting for us to be somewhat decentralized, somewhat plugged into one stream of information source, but somewhat lossy.Like it's okay, you know, that, that not every delivery is like atomic consistency. Like we're not dealing with a database sometimes.[00:16:27] Mikhail Parakhin: This is a very good point, uh, because since humans don't write code too fast, you know that global mutex is not too bad. Once you-[00:16:36] swyx: Yes ...[00:16:37] Mikhail Parakhin: start writing code at the speed of machine, it becomes the, you know, the bottleneck.Then what do you do? Maybe, and I can't believe I'm saying this because I, I'm long-- lifelong opponent of, uh, microservices, and I always thought that was, like, a really bad idea. And now that you're saying it, like, maybe in new guys like microservices will make a comeback, you know, because then you, you can ship things independently in tiny things and, and the managing all that complexity automatically will be much easier.I don't know. Like, we'll s-- we'll have to see.[00:17:10] swyx: Yeah. I mean, I don't know what the Microsoft or, or Shopify thing is, but I, I read this paper from Google where they have a monorepo that deploys into microservices, right? And then, uh, the other concept that I think about a lot is the Chaos Monkey concept from, from Netflix.Being able to create, like, this robust system where, um, uh, you know, you, you have the service discovery, you have the, uh, the independent, independent microservices discovery and, and, uh, you know, probably going to be a fair amount of duplication. That's how an organic system sort of scales, uh, that, that you have that...I don't know how you call it. Slack? Robustness? Depend-- uh, d-duplication. I, I, I forget the-- I, I'm-- And this-- those-- these are not exactly the terms- Hmm ... I'm looking for, but I c-can't really think of the words. Okay. I was gonna go into Tangent and Tangle. Uh, so, uh, we, we sort of discussed the overall stats that, uh, Shopify has.Uh, but, you know, I, I think some, some pretty cool stuff that you guys are working on is your ML experimentation, uh, and your, your sort of auto tr-research training pipeline. Presumably you're much closer to this one because it's, it's a sort of personal hobby of yours. How, how would you explain them in, together?I thought we have a slide that, like, uh, has the s- the system diagram.[00:18:24] Mikhail Parakhin: Yeah. Tangle first and then Tangent as a-[00:18:27] swyx: Yeah ...[00:18:28] Mikhail Parakhin: as a thing on top of Tangle. And, uh, Tangle is the third generation, I claim, of, uh, systems of, uh, running any data processing, but a bit with a skew for ML experiments, but not necessarily. Any sort of data processing tasks where you need to iterate, share, and you have scale so that you want maximum efficiency.You know how, like, normally you would work, you would-- Imagine you're a data scientist or an ML practitioner, you would get Jupiter notebooks or, or maybe you would get, uh, you know, Pyth- your Python scripts, and you would manage the data, and you produce those TSV files, and you put them in some JFS or something.Then you would notice that, oh, it has this, uh, weird missing values. You go and write another script that, uh, goes and replaces them with, uh-[00:19:20] swyx: Ah ...[00:19:21] Mikhail Parakhin: dash S. And then, then you, then you run some, some, uh, “Oh, I need to filter bots.” And so you run some light GBM model that, uh, removes the bots. And then, then you like-- And then you, you kind of like get into shape, and then you start experimenting, and you run multiple experiments, and then you're like, “Oh my God,” like, “this experiment is worse.”You undo, and you cannot get to previous result. And like, “Ah, what did I do?” Like that. Again, then, then you finally like get everything working. Then you like start throwing it over the fence to production. You, you replicate it, those things don't work, and then sometimes you like don't notice that you forgot some feature naming and the, the features don't match.But then, like imagine you, you did everything, and then six months later you're like, have to repeat it because now there's more data, or you wanted to do another pass, and you're like, “What, what did I do?” Or like, or like, “This script crashes now,” or the, “the path has changed.” And then, then you're trying to, like you spend another month just doing ar- digital archeology on your own, you know, history, right?Now multiply that by many, many teams. Now imagine you got an intern that you wanna ramp up. Now you have to show that intern, “Oh, you know, look, here's the folder, there's the scripts, you know, ask your cloud agent to do, and then, uh, to, to figure it out.” And then cloud agent does something, and then you're, “Ah, yeah, right, right, it was the wrong folder.I forgot to tell you, I actually have this other thing I forgot myself.” And, and that's, that's the, like, the daily life we all, uh, all know it, uh, if, if you're a data scientist, machine practitioner, ma- machine learning practitioner or, uh, or even like any data managing, uh, person.[00:21:00] swyx: Yeah. So I, I used to do this, uh, f- uh, on the quant finance side, uh, in, in my hedge fund.So we did this before Airflow, and then, uh, obviously Airflow came along and, uh, then more recently Dagster, uh, I would say is like, in my mind, what I would use for that shape of problem, uh, where you had to materialize assets and create a pipeline.[00:21:19] Mikhail Parakhin: And that's, that's very good segue because... So Airflow is great, but Airflow is more about you, you have something and you wanna repeatedly run it in production on schedule.It's less about you as a team developing things and being able to share, and you grabbing the standard pipeline and saying, “Hey, I wanna change this tiny little component in the huge sea of data processing, and I don't wanna-- I wanna run ten experiments on this, and I wanna do hyperparameter optimization.”All that is very hard to do with Airflow. It's very easy to do with Tango. Tango is m- more about, it's everything about group of people Running experiments, it might be agents too nowadays. Uh, running experiments cheaply, collaborating, sharing results. Uh, you don't need to understand fully. You, you grab-- you clone somebody else's experiment or somebody else's pipeline, uh, run, uh, change small piece, run it, be, like, get it to production state, and then ship in one click.So then the... You don't have to port it into any other system to, to run in production. You can just run the same experiment. It's, it's fully production ready. And, and it's, uh, it has lots of... Again, as I said, it's third generation system. The original one was, I would claim there was Ether and then, uh, at least in my career, Ether was the first, first, uh, that pioneered this type of approach.And then there was, uh, Nirvana, which, uh, uh, at Yandex, which did kind of sec-second take on this. And now this one aggregates the, the learnings from all of those and, and Airflow as well to, to get to the state where you try it, it, it feels kind of magical. Uh, ‘cause now everything is based on content, uh, hashes.So even if the version changed, but if the output didn't change, nothing is being rerun. It's very efficient. If you... Multiple people start experiment that needs the same sort of data preprocessing, it's not repeated multiple times. It's automatically done only once. If you start ten experiments that all require, you know, some, some data preparation first as the first step, and you don't have to coordinate for that.Like, you don't have to know that other people are starting it. You now, it's very easy compos-, uh, composability, any language you can u- uh, you wanna use, and it's very visual. So you can see immediately, you can edit it easily, you can assemble small things with just even mouse clicks if you want to, and, uh, share, clone.And everybody knows also it's fully kind of static in the sense that we rerun it second time, it will exactly have the same results. Like, you will never have to do digital archeology. So full versioning and everything is also there.[00:24:06] swyx: Uh, so, so people can, uh... It's open source. Go to the GitHub repo and, and, uh, check it out.Uh, and it is also a really good, uh, blog post about it. I think all these is, like, really appealing. The, the, the, the thing that I think sells me the most about it is that, um, sort of development to production transition, right? Which I think, um, a lot of people haven't really solved that, uh, strictly, right?Like, we develop really, really well in, in Python notebooks, but then, you know, that's obviously not a sort of production ready process. I think that, like, any way in which that is solved, I think is, is very appealing. Then the other thing that you mentioned, which also raised my eyebrows, was content-based caching, which you mentioned is, is, um, you know, is ve-very much, uh, um, a sort of efficiency measure about, uh, you know, just like recalculation only on, on sort of content addressing Which I think makes sense.Uh, it surprised me that the savings could be this much, but maybe I just haven't worked at your scale where there's so much duplication, uh, that people just rerun because they change a single ID upstream.[00:25:10] Mikhail Parakhin: It does, yeah. But it's not only you rerun. The, the main savings are coming from the fact that you ran it, you got your job done, and you moved on.Then- Yeah ... somebody else in some department you don't know existed runs the same task, but on a newer version.[00:25:27] swyx: Yeah.[00:25:27] Mikhail Parakhin: Like right now, you can't, in, in most of the organizations, you can't even find out about it so that you can't even measure that you're spending that time twice, right? Here- Yeah ... if everybody's on Tango, that's detected automatically and detected that the output is the same.And then for that person, all it looks like is like experiment just suddenly moved, jumped forward, right? Uh, uh- Yeah ... so that's because, because the, there's network effect of multiple people helping each other.[00:25:51] swyx: Yeah. This is one of those things where it's designed to be a platform from the beginning rather than an individual developer's tool from the beginning, right?And, and everything's gonna streams down from there. That is the sort of Tango, uh, orchestrator, and it's, it manages jobs. We've seen a few versions of this, and this is obviously, uh, uh, the sort of, uh, unique approaches that you guys have, have, uh, figured out. And then there's Tangent.[00:26:14] Mikhail Parakhin: Yeah. And Tangent is basically an automatic auto research loop that can help and kind of do your work for you.Uh- ... you know, uh, effectively, effectively, Andrej Karpathy recently popularized it with auto research. Yes. Remember he said like he was, uh, speed running this, uh... Yeah, uh, you know the story. The, here we're basically bringing the same capability into Tango so that, uh, the, uh, Tangent can analyze it. It's just an agent that can run multiple experiments, figure out what can be changed, and keep on rerunning it, keep on modifying until, uh, maximizing some goal, some loss function, whatever you need to, to achieve.And in general, I would say if you're not using auto research-like approach in whatever you do, like literally whatever you do, then you're missing out. We saw at Shopify that taking like a wildfire, anything where you can put measurements can be done dramatically better. Our-[00:27:19] swyx: Mm-hmm ...[00:27:20] Mikhail Parakhin: uh, speed of, uh, templatization HTML, uh, completely new UX tem- uh, templatization of, uh, reducing latency for liquid themes.Uh, we-- Our, uh, search, uh, recently we moved from It's hard even, uh, quote from eight hundred QPS to forty-two hundred QPS with the same quality just by pure optimizations and not a research loop that kept running and changing code in our index serve on the same number of machines, just increasing the throughput.We, we managed to improve the quality of gisting and machine learning process. Uh, you know, gisting is the prompt compression technique that[00:27:59] swyx: allows for[00:28:00] Mikhail Parakhin: lower latency and, and lower and, uh, actually higher quality slightly. So like literally whatever different walks of life, and it doesn't have to be AI related.Uh, we, we had a reduction in, uh, storage because the agents would go and find data sets that clearly are derivative, uh, and then you don't need to store things twice. You know, we, we, we found somewhat embarrassingly that it was one of the largest tables was hashing random IDs into another random ID, and we literally- Oofput only one. So it was translating, yeah, two random IDs hashed[00:28:36] swyx: into[00:28:37] Mikhail Parakhin: each. So, so[00:28:37] swyx: it has access to the code as well, so it can, it can check the, like what, what the hell is it doing?[00:28:42] Mikhail Parakhin: So there, there cou- it could be run in two levels. You, uh, you know, at the superficial level, it could just use ex-existing components and, uh, reshuffle them.Uh, you know, like you can grab- Yeah ... uh, XGBoost, and you can grab some, some Py- PyTorch module, and then can grab some, you know, grab another tools and, and combine them. At a deeper level, since Tangle is all sort of CLI based underneath you, every, every component is a wrapped really CLI, uh, call and a YAML file, it can analyze code and create new components and, and, uh, keep on iterating as well.So, so you can, you can both have quick modifications of existing t- uh, pipelines with the, with components that are already there pre-baked, or you can create new components, uh, and-[00:29:29] swyx: Yeah ...[00:29:29] Mikhail Parakhin: keep iterating on those. So auto research is, again, this is probably the, the thing I was excited the most in the last two months happening, and we see it taking like, like totally like a wildfire.Just, uh, everybody, every day, every... well, every day, every minute, I would, uh, have somebody Slack message saying, “Oh, look how much better I made it.” And, uh, it's all throughout the research.[00:29:53] swyx: Is this democratized in some way in, in the sense that like is it your ML, uh, engineers and researchers doing this, or is it your regular PMs and software engineers also have the ability to auto-- to use Tangent?[00:30:07] Mikhail Parakhin: This is an awesome question. Like, Tango in general and Tangent in particular are extremely democratizing. Like they- Yeah ... they are the main tools for- ‘Cause I don't[00:30:15] swyx: need the details.[00:30:16] Mikhail Parakhin: Yeah. Exactly. Initially used by ML and AI engineers, but then literally, as you said, PMs are like the highest user right now is one of PMs on our org, uh, Sartak and he was, he was number one by, by usage of, of this ‘cause they're just, uh, energetic and knowledgeable, and now it, it unlocks a lot of capability where you don't have to co-change code manually.[00:30:39] swyx: I mean, I mean, because it kind of cuts out the ML, ML engineer from the process because the, the, the PMs have the domain knowledge and the ability to think about, uh, from first principles about, okay, what, what results do I want? And they can-- they even have the access to the data that, that needs to go in.So it's like in some ways, like this is the magic black box that we've always wanted for, for training and, and for, uh, I guess, uh, uh, hill climbing, whatever.[00:31:04] Mikhail Parakhin: It's basically cloud code for your AI development- ... uh, situation, right? Like now, now you don't have to know exactly how algorithms work. You can just, uh, bring your domain knowledge and expertise and product knowledge and iterate within Tangent until you've gotten the results that you need.[00:31:21] swyx: In my previous roles, every time that someone has pitched AutoML, you know, I've always been like, “Uh, this is not, this is not gonna work. It's, you know, it's, it's always gonna be a flop.” Somehow it's working now. I mean, presumably the answer is now we have LLMs and it's good enough, right? It's, it's an emergent property that we can do auto research, but like, it doesn't feel that satisfying that how come we didn't do this before, right?Like we just did like parameter search and like, I don't know. That's maybe that's it.[00:31:48] Mikhail Parakhin: Yeah. Bayesian optimization and hyperparameter optimization was, was the one that, or facet of AutoML that was used very actively, which incidentally also built into, uh, Tango. But, you know, I know Patrice Simard very well, and, uh, he was such a, uh, such a proponent of AutoML, and he put, like literally spent careers trying to democratize it.Without LLMs, it just turned out to be very hard. Like it, you, you would have flexibility within certain narrow domain, but it was hard to wider scale, and now with LLMs suddenly it's like magic wand, and so suddenly everybody- ... is an AutoML expert.[00:32:28] swyx: Yeah, I, I think it's multiple things, right? Like I'm, I'm just gonna bring up the, the, the chart again, right?Like LLMs can do the monitoring very well. That is the very potentially unbounded, super unstructured. It can do the analysis very well, it can do the... Uh, and basically it is much more intelligence poured into every single step. Uh, there's maybe nothing structurally changed about AutoML, but this is just m-more intelligent and more unstructured.[00:32:53] Mikhail Parakhin: Exactly.[00:32:54] swyx: Any flaws that you've run into? Like everyone is like drinking the Kool-Aid, oh my God, time savings, uh, you know, performance improvements. Like what, what, uh, issues have you have, uh, come up?[00:33:06] Mikhail Parakhin: This is really cool. It's not a solution to all the world's problems for sure. The limitations are usually the ones I-- And this is where we get into a bit of a subjective territory.Uh, I can only share what I've, I've seen so far, and I'm sure the situation, uh, is changing, and, you know, maybe after I say it, like many people will reach out and say, “Hey, what about this?” And you don't know that, and then, then we'll be probably right. But what I've seen is auto research is very good at doing kind of obvious things that you don't have bandwidth to do or you didn't notice or maybe you're not aware of like the-- some standard practices.It is not good at doing something completely out of distribution, something that, you know, you have to think for, for multiple days, uh, and, and do something like none of this. So, so it's, uh, I, uh, set an experiment once, uh, on, on my sort of, uh, hobby thing, and I let it run for, uh, ended up, uh, several weeks run, uh, you know, it's like full production kind of scale, so it, you know, slow runs and, and it ex-- it performed in the end, uh, over four hundred experiments, and only one was successful.I'm like, “Okay, that's, that's good.” But-[00:34:18] swyx: But it saved time.[00:34:19] Mikhail Parakhin: Yeah, I saved time. Like it, it was the, that thing. Yeah, if I, if I were doing four hundred experiments myself, my betting average, as I said, would have been much higher, I'm sure. But also, first of all, it would take me like three years to do four hundred experiments.And, uh, I didn't have to do them. Like the machines were just, uh, the price of electricity did that. So, and I got one improvement, uh, that in, uh, my, my-- Honestly, when I was starting that experiment, my thinking was to go and show that, “Hey, Andre, maybe you just don't know how to optimize.” And I was super smart because in, in my pro-problem, it was optimized for many years, and it was like fully improved.Uh, and I didn't expect it, you know, auto research to find anything at all. Yet it did. So instead of making fun of Andre, I ended up, uh, a big, big supporter. Yeah, that's exactly the tweet. Yes.[00:35:10] swyx: You and Toby really, really go back and forth on-online a lot, which is really funny. Uh, think of it as, as an eval for the optimalness of the code it's running on.Uh, it's almost like it reminds me of like a Kolmogorov complexity thing, but, uh, I guess it's-- there's some optimal thing that you're trying to sort of reduce down to, I guess. Um, and so, so you, you, you know, you should congratulate yourself that you had, uh, you know, uh, ninety-nine percent, uh, optimality.[00:35:36] Mikhail Parakhin: Exactly, yeah. I think Andre really deserves a lot of credit for popularizing this approach. This is, uh, this is incredibly, I think, powerful and cool and You know, the, uh, even him, him just mentioning it led to a lot of gains in a lot of places in the industry, so we should be thankful.[00:35:56] swyx: Yeah. I think he also has a just...I don't know what it is. Like, um, you know, it, it is a simple self-contained project that people can take and apply to other things, which is, is, is one thing, but also just the name. Just like somehow no one, no one managed to call their thing auto research. It's just naming things is very important. I think that that is mostly, uh, our coverage of Tango and, and, uh, Tangents.I think obviously, you know, there's a lot of, uh, ML infra at, at Shopify that people can, uh, dive into. We're about to go into SimGym, but before I do that, any, any other sort of broader comments around this whole effort? Like where is it, where is it leading to?[00:36:36] Mikhail Parakhin: As a segue to SimGym, like all those things start composing strongly.And, uh, you could see a huge unlock when you can look at each one of the tools and, and you see, oh, they're extremely useful. Uh, Tango is useful by itself. Auto Research is useful by itself. SimGym is useful by itself. If you combine all three, you create like synergetic effect. I think that's why we wanted to even, uh, cover them today is because this is something that if you go back even, you know, five years ago, would've been unthinkable.Uh, replicating that, uh, would, would be either incredibly costly or impossible, right? With probably thousands of people are required.[00:37:20] swyx: Well, we have serverless human, uh, serverless intelligence, right? Like, uh, so yes, you do have thousands of hu-- of, of intelligences, not just, not humans. And that's, that's close enough, right?Even if they're not AGI, they're, they're close enough to do the, the task that you need them to do. And, and, you know, that's, there's plenty for, for a lot of routine work, knowledge work. Okay, let's get into SimGym. Um, this is one of those things I, I was surprised to see actually it's apparently your, uh, one of your most popular launches, and I think something that, uh, I think Sim AI, I think Yunjun Park, who did the Smallville thing, there's a very small cottage industry of people trying to do like the simulate customer thing.I think a lot of people maybe don't super trust this yet because they're like, well, obviously they would just do what you prompt them to do, right? But maybe just think, uh, tell us about the sort of inspiration or origin story.[00:38:10] Mikhail Parakhin: That's exactly actually the thing I wanted to cover, because if you don't have the historical data, all you can do is prompt a-agents in a vacuum, and they will do exactly what you prompt them to do.In fact, when I first proposed it, and this is a bit of, um, my brainchild initially, if I, I can boast, even Toby said like, “But wouldn't they, they just repeat what, what you tell them?” And, uh, but I'm like, “Yes, except Shopify has decades of history of how people made changes and what there is, uh, there, what it resulted in terms of sales.”So now what we can do is we can-- we have this... It's not, it's a noisy data. There's a small, usually websites, uh, you know, like things, things are never in isolation. It's almost never AB experiment. It's always AA experiment when there's has two meanings, but basically, you know, in different time you run two different things.But if you aggregate in general, uh, like everything together, and you apply, uh, denoising and collaborative filtering like approach, you can extract a very clear signal. And then you can optimize your agents. And that's why it took so long. It took almost a year of that optimization of just us sitting and fiddling, and, and we had this internal goals of correlation of hitting-- internal goal was to hit zero point seven correlation with, uh, add to cart events, for example.Like that, that if we run real AB test experiment, that it should, it should go and, and rep-uh, replicate, uh, same sort of success that, that humans had or lack thereof. And it, it took forever, and I don't think that's easily replicatable because, uh, like who else would have that data? You have to have this historic, you know, decades, uh, worth of data.And now, now the, like the other thing you need is in-infrastructure and the scale, right? Because, uh, w- again, what we found, uh, stat sig results, you need to run a lot of simulations, a lot of agents, and, and it's-- Those are expensive things. Like you're, you're making actions in the browser because you want a real friction.You want to, to be able to get the image like of what humans will see because you wanna, uh, detect effects like, “Hey, if I make my images larger, will I have more sales or l- uh, fewer sales?” And like usually people's intuition here, by the way, is that I increase my images, I will have more because they look nicer.You know, designers all look sparse and big images. Like usually your sales tank, right? But, but, uh, you know, from HTML, all the characters look the same only the, the size tag looks different, right? So it's very hard. So you have to take visual information, you have to run this in simulated browser environment on the big farm and, and of course, you have to have, uh, like very, very expensive model, good model with multi-model model.So all this it's-- is what's taken so long and, uh, to share my personal fail a little bit there, Sean, is like, you know, we always had this bias to-- for like large company bias. You know, we always, uh, whenever you-- we do, we're like, “Hey, we'll run an experiment,” right? We make, make a change, and we will run an experiment and then, uh, see, uh, see which one's better or like, “No, this is worse,” and most of them are worse, so you discard it and keep iterating, hill climbing.And we're like, “Oh, like smaller merchants, they cannot get stat sig results. They cannot really run experiments simply because, you know, in a week there would be not enough data for them.” So we thought from this perspective. What we didn't realize is that most people don't have A and B, they just have one thing, and they need suggestions of What A and B should be.So, uh, we first build this, hey, we run simulation on two separate teams and, and, uh, say, “Hey, which one is better?” We then morphed it into, and very recently just released it, when you have just your site, your theme, we run over it and we say, “Hey, here's what predicted values of, of, uh, uh, conversions are, and here's how we think you should modify it to increase your conversions.”And then circling back to what you started with, the proof is in the pudding. Like, if we are not correlating with reality, like, people will not be using it. And, uh, thankfully, we see literally every day more users than the previous day. So, so right now, uh, right now- It's working. Yeah. I'm-- Right now my problem is how to pay for it all because the so our major thing is how to optimize the LLMs, do distillation, how to run the headless browsers, uh, and handful browsers, uh, uh, cheaper so that we can accommodate the increase in traffic.[00:42:47] swyx: Yeah. I, I understand that you, uh, you published a lot of technical detail at GTC, so I was just gonna bring it up a little bit. I think s- was this in, in con-conjunction with some kind of GTC presentation? Or something like that, right?[00:42:59] Mikhail Parakhin: Well, we, yeah, we, we did it in several place, but yeah, we had the engineering- Yeahblog, uh, as well. Yeah.[00:43:05] swyx: Yeah. So you're running, uh, GPT OSS. Uh,[00:43:08] Mikhail Parakhin: the, this is an older version. You know, now we run multimodal model. But yeah- Yeah ... GPT OSS, we still run GPT OSS as well for[00:43:15] swyx: And then you have the VMs, and you also have browser-based. I really like this one where it you said, “It violates almost every assumption that standard LLM serving is designed for.”And then you had like, basically orders of magnitude differences between everything.[00:43:29] Mikhail Parakhin: Exactly. Which is, which, uh, which was, you know, a bit of a challenge to implement, like when, like even simple things. Uh, be- since it violates all the assumptions, for example, multi-instance GPUs, like MIGs don't work as well.But we needed, uh, to get MIG to work because, ‘cause otherwise it's way too expensive. And so we had to deal with the, yeah, with, uh, lots of infrastructure and, and, uh, work with, uh, uh, Fireworks and CentML, uh, you know, to help with optimizations and browser-based, as you mentioned. Yeah, like, takes a village.[00:44:04] swyx: Okay. So there's a lot of like, I guess, experimentation in the infrastructure so far, and you've published more or less what you have here. I guess I'm, I'm less familiar with CentML. I, I don't do, uh, that much work in this, this part of the stack. But why was it the sort of preferred instance platform?[00:44:22] Mikhail Parakhin: There are really three probably top companies. There used to be, uh, uh- Three top companies, uh, at least I was aware of that did, uh, LM optimization. You know, together Fireworks and Santa ML, not necessarily in that order. Santa ML recently got acquired by NVIDIA. Uh, what they did is if you have a model and you want to optimize it to a specific prof-- uh, profile of usage, uh, they would go and do it.And, uh, we work with, with those companies, uh, this was work particularly in with Santa ML and NVIDIA to get them the best possible results out of it. And, and sometimes you, you have to retune depending on, like sometimes you want the maximum throughput, sometimes you want minimal latency, sometimes you want like the cheapest, right?And, yeah, or some combination. And so yeah, these are people who would come and help you.[00:45:14] swyx: I see. I see. Yeah, yeah. I'm familiar with these people for the LLM, you know, autoregressive stack. But the other interesting category of these optimizers is also the diffusion people, whereas like Fel and, you know, uh, Pruna recently has come up a lot as well, which I think is like really underappreciated, uh, at least by myself, because I, I thought, oh, all the workload would be LLMs, but actually there's a lot of diffusion as well.[00:45:38] Mikhail Parakhin: Exactly.[00:45:38] swyx: There's a lot here, so I, I, I... it's, it's, uh, it's, it's, it's hard to cover. But I, I do think like people underappreciate the importance of customer simulation, basically. I think this is something that I'm candidly still getting to terms with. Uh, you know, uh, you also-- your team also like prepared this, like, really nice diagram.Uh, I, I assume this is AI generated.[00:46:00] Mikhail Parakhin: Yeah, it looks-[00:46:01] swyx: Maybe it's not.[00:46:01] Mikhail Parakhin: Yeah, it looks, uh, Gemini-ish. Yeah, but, uh, uh, honestly, I, I don't know where, where the hell they generated. It looks, look, uh, looks like it's, uh, Google. But the interesting part, John, that, that, uh, we haven't covered, but I, I wanted to mention is if your store had previous customers, rather than it's a new store, you're like new merchant just launching things, it helps tremendously in just correlation and forecast.Yeah, we take your previous, uh, customer's behavior, and we create agents that replicate those specific distribution of, of customers that you get, and then we a- we apply those to your changes, and then that, that raised raw, you know, the re-- uh, just correlation with the add to cart events or to-- with conversion or whatever it, it, it may be, uh, quite dramatically.So, uh, replicating humans in general seems like an interesting, cool challenge.[00:46:58] swyx: As a shareholder, I think this is the-- like if people are Shopify shareholders, they should really deeply understand this because this is basically the moat. The, the more you use Shopify, the more it will just automatically improve, right?Like you're, you're doing the job for them.[00:47:13] Mikhail Parakhin: Yeah, that's what we started with. Like, uh- ... uh, otherwise, if you're just a startup, I wouldn't do it if, uh, you know, if it was my startup because Without the data, it, yeah, as, as you said, it's, it's exactly the case that, uh, whatever you say in prompt, that's, that's what the agents will be doing.[00:47:30] swyx: The statistician in me wants to like really satisfy the sort of, um, statistical intuition, I guess. Um, to me it's kind of, uh, the, the word that comes to mind is, um, ergodicity. Uh, so let's say a, a customer takes this path, customer takes this path, customer takes this path, right? Um, the... In my mind, the way I explain it is like, okay, here, here's the ninety-five percentile, here's the five percentile, and here's the median, right?Um, but to me, what SimGym is potentially doing is that it can, uh, modify... It can sort of model the sort of in-between sort of journeys as well, that, that maybe are dependent on the previous states. This may be like a very RL-type conclusion where like basically the summary statistics, if you only did naive AB testing, you only have the, the statistics at, at, at a certain point, and you only judge based on the sort of overall summary statistics.But here you can actually model trajectories. Does that make sense? Or-[00:48:31] Mikhail Parakhin: That makes total sense because like, well, that, that makes even more sense that maybe even you realize bec- because-[00:48:38] swyx: Okay. Please,[00:48:38] Mikhail Parakhin: please. Yes ... we do-- Yeah. The, so internally, uh, we have this system, we talked about it briefly once at NeurIPS.We have a huge HSTU-based system that models the whole companies, uh, and their possible paths. And like- Yeah ... what you are, what you are showing, like actually at any point of time, you can either model the user's behavior or you mo- can also think about, uh, the whole merchant as a company, as the entity that acts in the world.You can model that as well. And then you can do, can do counterfactuals. In your graph, like in your blue graph, uh, if you're... Imagine in the center there, uh, somewhere in the middle, you would have an intervention. I give that person a coupon, or I don't know, I send a personal thank you card, or give a discount in some- somewhere.And then you can, uh, then you can do forward rollouts from that counterfactual. So what would have happened with that intervention or without the intervention? And you can even ch- change where that intervention, uh, in time can happen, right? Like some- where, where in this journey. So we, we do this at the Shopify scale for our merchants, and then if we notice that something that they can be fixing, like there's a strong counterfactual, like we have Shopify policy, they basically get a notification like, “Hey, we think your...something is wrong with your-” I don't know, Canadian sales. Like, uh, it looks like it's misconfigured. Here's what you need to do. Or do you think like, uh, you have to set up this campaign with these parameters? And we do that at the buyer level to literally offer discounts or cashback or, or things to buyers.So this is-- I'm getting very excited. Like this is my sort of area of, uh, interest, I guess, and, and hobby. But being able to m-model something complex as human beings or companies and model counterfactuals on it, where you can have interventions in the future and optimize when to make intervention, what kind inter-- uh, what kind of intervention to make.It's such an unlock that previously was completely impossible. Like the-- it was, it was always dreamed of, but never... Like how would you even simulate it without LLMs or HTUs? I think very, very exciting times.[00:50:59] swyx: I just wanted to, uh, to maybe illustrate this. I, I'm not the best illustrator, but I, I am a conceptual statistics guy.And y-you know, you cannot just do this. Like this is a dimensionality AB test doesn't do, right? Like, uh, because it doesn't have the, the, the change over time, uh, stochastic nature, uh, and it doesn't have the sort of contextual like... Here's all the context to this point. Um, okay, cool. Um, that's SimGym.You're, you're gonna burn a lot of tokens on this thing. But you're, you're one of the, the only scale platforms in the world that can, uh, that can do this across a huge variety of workloads, right? I'm even curious on a sort of human, uh, research level of like, well, do, does retail behave d-differently from like clothing sales?D-does that behave differently from electronic sales? I, I don't know. I don't know what else you guys... The Kardashian shoppers, do they differ from like people who buy, uh, I don't know, cars and, uh, whatever.[00:51:55] Mikhail Parakhin: Well, very different, and different sensitivities and different modes of, uh, shopping and, and different levels of what's important.Now, to-totally, you can do aggregations at, uh, at a store level. You can do aggregations at a different, uh, category level. I don't know if, uh, you know, for our statisticians among us, I couldn't believe, but we-- recently we're looking at it, and we had to bring back, uh, CRPs, you know, Chinese restaurant process.It's a, like, way of aggregating and, like, naturally grow clustering. So across... Specifically to answer questions that, uh, like you were just posing on how, how if, if buyers behave different categories. And I'm like, “I haven't seen CRP since two thousand and one.” It's[00:52:37] swyx: so What? It's so- What is... No, I haven't, I haven't seen this.No. This is not in my training. Uh,[00:52:44] Mikhail Parakhin: but, but yeah, it, uh, uh, it actually, like the, the-- there was a very popular kind of theory, popular neurips HTML circles in early two thousands, uh, kind of nice. And now, now it has practical applications, uh- Yeah ... that we were resurrecting.[00:53:03] swyx: Yeah, amazing. Uh, I, I can see, I can see how this is like a, uh, a fun job for you where you get to apply all these things.Um, yeah, yeah, so super cool. Super cool. So, okay, so, so anyone who, who knows what CRPs are and has always wanted to use them at work, uh, they should, they should definitely join Shopify. Okay, so w-we have a lot and but I, I'm, I'm being mindful of the time. I, I do wanted to, to sort of cover some other things.Um, I-I'll give you a choice, UCP or Liquid?[00:53:30] Mikhail Parakhin: Liquid. I think, I think on UCP, you know, like UCP is very important for us and, and it just we are-- UCP, we have a structured, uh, discussions, and you can read about them, and we have, uh, blog posts, and we have a big release this week, in fact, like with our catalog.Oh,[00:53:46] swyx: okay.[00:53:46] Mikhail Parakhin: Uh, yeah,[00:53:46] swyx: but- Le-I mean, we, we can, we can discuss the, the, the release briefly because we'll release this after the-- after it's already announced so whatever. There's a catalog that you guys are doing?[00:53:55] Mikhail Parakhin: Yeah. So we are, we are- Okay ... we are bringing in capabilities of a whole, uh, Shopify catalog.Basically, you now you can search for products, you can do lookups by specific ID, you can do bulk lookups when you need to bring m-multiple products. You don't need to know in ad-in advance what you're trying to show or to sell or check out. Like, you can now, you can now have this decided at, at runtime, and this big area for investment for us for both non-personalized and personalized searches, trying to provide basically a win-window into whole universe of products that are being sold everywhere in the world.And Shopify is really not exactly, but almost like a super set of any-anything being sold. Now we are bringing it into UCP and, uh, and, uh, identity linking is another big thing for us, uh, so that you, you can use, uh, like Google or whatever, whatever identity you have, uh, they're minimizing friction.[00:54:56] swyx: Yeah. So[00:54:57] Mikhail Parakhin: yeah, big release for us.But Liquid AI of course we never talk about, and the problem might be more, more aligned with what we d-discussed previously on this chat.[00:55:07] swyx: Sure. The main thing that everyone understands about Liquid is that it is inspired by Worm, and I still don't know why. I'm curious on your explanation. I think you, you, uh, you can make things very approachable.And also I think like what is the potential of like the, the level of efficiency that you get out of Liquid?[00:55:23] Mikhail Parakhin: You- we all familiar with transformer architectures. And, uh, for the longest time, there was a competing architecture, it's called the state space models. So, so Sams, uh, you know, Chris, Chris Reyes, one of the pioneers and, and lots of startups, uh, trying to make those realities.They have, uh, significant benefits being main being, uh, being much faster and, uh, lower footprint and not quadratic in length, you know, sort of, uh, linear in, in, uh, in your context length. But with state space models- They never quite made it. Like they're used-- They have, uh, certain niches when they thrive, their hybrid architectures are useful, but they never quite made it.And liquid neural networks are, you can think of them as a next step, like, uh, sort of, uh, state-space model square. It's non-transformer architecture that's more complicated than sta-state space and really difficult to code if you-- if I'm being honest. But it's, um, very efficient. It's, uh, subline-- sub, uh, quadratic in, in length of your context.Uh, it's very compact way to represent things, and that's a liquid AI company. They... Their goal is to productize it, and very often you have this need, uh, when you need to have long context and small model, and you want to have low latency. Like in general, it's basically on par with transformers, and if you do hybrids with transformers, it's, it's even better.That's why we at Shopify, when we tried multiple and we constantly try multiple models, multiple companies, we found that for small, particularly with low latency applications, when you have low latency and/or if you need longer context lengths, liquid was the best. And so we still use the whole zoo and always like obviously test and use everything, uh, every open source model and, you know, it feels l

Teal Town USA
San Jose Sharks @ Winnipeg Jets - 4/16/2026 - Teal Town USA After Dark (Postgame)

Teal Town USA

Play Episode Listen Later Apr 17, 2026 88:01


The 35th season for the San Jose Sharks ends in Winnipeg in a dominant fashion. Team Teal won 6-1 as Macklin Celebrini breaks Joe Thornton's single season point record with 115. Erik Kuhre & Dana Meyerson discuss Macklin's incredible year, the turnaround the Sharks had this season, the playoffs and dream of a draft lottery. Thanks for watching all season and stay tuned in the offseason for continued coverage on the Sharks and Cuda. Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists!
 Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!


Atareao con Linux
ATA 788 Cuatro herramientas de IA para Spotify y YouTube

Atareao con Linux

Play Episode Listen Later Apr 16, 2026 18:46


¡Hola! ¿Cómo estás? Soy Lorenzo y te doy la bienvenida a un nuevo episodio de Atareao con Linux. Hoy te quiero abrir las puertas de mi laboratorio personal para contarte algo que me tiene entusiasmado: cómo he conseguido que la inteligencia artificial y la automatización se conviertan en mis mejores aliadas para sacar adelante este proyecto.Las herramientas de la revoluciónPara que entiendas cómo funciona mi flujo de trabajo actual, te voy a desglosar las cuatro herramientas que se han vuelto imprescindibles en mi equipo:1. Whisper (de OpenAI): Es el punto de partida. Esta maravilla de la tecnología es capaz de escuchar mis audios y transcribirlos a texto con una precisión que da miedo. Gracias a que utilizo una tarjeta gráfica Nvidia y soporte para CUDA, el proceso es rapidísimo. Whisper no solo me ahorra tener que escribir notas a mano, sino que me da la base para todo lo que viene después.2. Google AI Studio y el poder de los Prompts: Una vez tengo la transcripción, el siguiente paso es pasarle ese texto a Google AI Studio. He diseñado un "prompt" (unas instrucciones) muy detallado que le dice a la IA exactamente qué necesito: que extraiga el minutaje de los temas tratados, que redacte una descripción amena para YouTube y Spotify, y que prepare los metadatos SEO para la web.3. Nano Banana (Gemini) y la generación de imágenes: Para las carátulas que ves en las plataformas, ahora confío plenamente en el modelo de generación de imágenes de Google. Aunque a veces es un poco testarudo con las dimensiones —yo le pido un tamaño y él me da otro—, la calidad visual es impresionante. Para domar a esta IA, he creado mis propios scripts en Fish Shell que se encargan de comprobar si la imagen es cuadrada o rectangular y de ajustarla automáticamente a lo que necesito para cada plataforma.4. Real-ESRGAN y el escalado inteligente: A veces, la imagen que genera la IA es demasiado pequeña para los estándares de calidad actuales. Aquí es donde entran en juego las redes neuronales de Real-ESRGAN. Esta herramienta es capaz de "inventarse" los detalles que faltan para agrandar una imagen sin que pierda nitidez.5. ImageMagick (o "Magic"): No podíamos olvidarnos de los clásicos. ImageMagick es la navaja suiza que utilizo para las conversiones finales, para optimizar el peso de las imágenes antes de subirlas a la web y para asegurar que todo cumple con los formatos estándar. Es una herramienta de terminal que todo amante de Linux debería conocer.Capítulos del episodio:00:00:00 La mejor inversión: Atareao.es00:01:38 Mi evolución técnica: Del hosting al VPS y Docker00:02:17 Los modelos de lenguaje entran en juego00:03:00 Resultados brutales con menos esfuerzo00:04:20 Herramienta 1: Whisper, el arte de transcribir audio00:05:11 Fish Shell: El alma de mis automatizaciones00:07:04 Herramienta 2: Google AI Studio y la magia de los Prompts00:08:41 Mi flujo de trabajo: Del guion al minutaje00:09:30 Herramienta 3: Nano Banana (Gemini) para crear carátulas00:10:50 Automatizando el formato de imagen con Fish00:12:00 Reals-ESRGAN: Escalando imágenes con redes neuronales00:13:50 Herramienta 4: ImageMagick (Magic), la navaja suiza00:15:41 El procesado de audio: Normalización y filtros00:16:45 Conclusiones: Automatizar para disfrutar más00:18:04 Despedida y red de podcastComo siempre digo, la vida son dos días y uno ya ha pasado, así que disfruta como si no hubiera un mañana y, si puede ser con Linux y "cacharreando" con estas herramientas, ¡mucho mejor! Un saludo y nos escuchamos pronto.Más información y enlaces en las notas del episodio

Citadel Dispatch
CD199: CRAIG RAW - SILENT PAYMENTS AND SPARROW WALLET

Citadel Dispatch

Play Episode Listen Later Apr 13, 2026 71:40 Transcription Available


Craig Raw, creator of Sparrow Wallet, joins to discuss silent payments, a new bitcoin address system that eliminates address reuse, removes the gap limit, and aligns privacy with convenience. Craig walks us through the history of bitcoin address derivation from single key to hd wallets to bip 47, then explains how silent payments optimizes everything except scanning cost and how his new server implementation, Frigate, uses gpu acceleration to mitigate that. We discuss the path to adoption including hardware wallet support, public server infrastructure, bip 353 human readable addresses, and the overall vision of upgrading from hd wallets to sp wallets.Craig on Nostr: https://primal.net/craigrawCraig on X: https://x.com/craigrawSparrow Wallet: https://sparrowwallet.com Frigate Repo: https://github.com/sparrowwallet/frigateEPISODE: 199BLOCK: 944916PRICE: 1384 sats per dollar(00:03:09) Craig Raw of Sparrow Wallet(00:03:27) Silent Payments: what they are and why they matter(00:06:01) From single keys to HD wallets: history and limits(00:11:41) Address reuse in the wild and UX realities(00:11:50) BIP47 review: pros, cons, and hardware wallet hurdles(00:15:18) Enter Silent Payments: design tradeoffs and hardware support(00:19:01) Key benefits: static codes, enforced freshness, no gap limit(00:21:02) The scanning-cost problem and early client approaches(00:25:27) Server-side strategy: database tweaks and GPUs(00:29:15) Why public servers matter and performance breakthroughs(00:33:37) Frigate with Electrum backends: deployment paths(00:37:20) Risks with public servers and practical mitigations(00:43:10) Uncle Jim model and GPU-ready home servers(00:46:21) GPU backends: CUDA, OpenCL, Metal and real-world nodes(00:47:34) Running everything on a laptop and pruning considerations(00:49:11) Human-readable addresses: DNSSEC and BIP353(00:55:14) What's needed next: hardware, node vendors, and runners(00:58:13) PSBT details, DLEQ proofs, and multisig caveats(01:02:13) Timeline to usable SP wallets and public servers(01:06:10) Reframing SP as UX: contacts and everyday payments(01:07:22) Ecosystem fit: who could ship this first(01:08:31) Wrapping up: calls to action and outlook(01:10:03) Closing notes: upcoming guests and eventsmore info on the show: https://citadeldispatch.comlearn more about me: https://odell.xyzmonitor the situation: https://citadelwire.com

The Lunar Society
Michael Nielsen – How science actually progresses

The Lunar Society

Play Episode Listen Later Apr 7, 2026 123:03


Really enjoyed chatting with Michael Nielsen about how we recognize scientific progress.It's especially relevant for closing the RL verification loop for scientific discovery.But it's also a surprisingly mysterious and elusive question when you look at the history of human science.We approach this question stories like Einstein (who claimed that he hadn't even heard of the famous Michelson-Morley experiment, which is supposed to have motivated special relativity, until after he had come up with the theory), Darwin (why did it take till 1859 to lay out an idea whose essence every farmer since antiquity must have observed?), Prout (how do you recognize that isotopes exist if you cannot chemically separate them?), and many others.The verification loop on scientific ideas is often extremely long and weirdly hostile. Ancient Athenians dismissed Aristarchus's heliocentrism in the 3rd century BC because it would imply that the stars should shift in the sky as the Earth orbits the sun. The first successful measurement of stellar parallax was in 1838. That's a 2,000-year verification loop.But clearly human science is able to make progress faster than raw experimental falsification/verification would imply, and in cases where experiments are very ambiguous. How?Michael has some very deep and provocative hypotheses about the nature of progress. One I found especially thought-provoking is that aliens will likely have a VERY different science + tech stack than us. Which contradicts the common sense picture of a linear tech tree that I was assuming. And has some interesting implications about how future civilizations might trade and cooperate with each other.Watch on Youtube; read the transcript.Sponsors* Labelbox researchers built a new safety benchmark. Why? Well, current safety benchmarks claim that attacks on top models are successful only a few percent of the time, but the prompts in those benchmarks don't reflect how real bad actors actually write. You can read Labelbox's research here. If this could be useful for your work, reach out at labelbox.com/dwarkesh* Mercury has an MCP that lets you give an LLM access to your full transaction history, including things like attached receipts and internal notes. I just used it to categorize my 2025 transactions, and it worked shockingly well. Modern functionality like this is exactly why I use Mercury. Learn more at mercury.com* Jane Street's ML engineers presented some of their GPU optimization workflows at GTC, showing how they use CUDA graphs, streams, and custom kernels to shave real time off their training runs. You can watch the full talk here. And they open-sourced all the relevant code here. If this kind of stuff excites you, Jane Street is hiring — learn more at janestreet.com/dwarkeshTimestamps(00:00:00) – How scientific progress outpaces its verification loops(00:17:51) – Newton was the last of the magicians(00:23:26) – Why wasn't natural selection obvious much earlier?(00:29:52) – Could gradient descent have discovered general relativity?(00:50:54) – Why aliens will have a different tech stack than us(01:15:26) – Are there infinitely many deep scientific principles left to discover?(01:26:25) – What drew Michael to quantum computing so early?(01:35:29) – Does science need a new way to assign credit?(01:43:57) – Prolificness versus depth(01:49:17) – What it takes to actually internalize what you learn Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

BSD Now
657: Hibernation is a long sleep

BSD Now

Play Episode Listen Later Apr 2, 2026 50:57


The Real Cost of Technology Dependence, FreeBSD 15 Linuxator with CUDA, Bidirectional OPNsense/pfSense, Netbase, a SYN attack, and more... NOTES This episode of BSDNow is brought to you by Tarsnap and the BSDNow Patreon Headlines The Real Cost of Technology Dependence: Building Independence with Open-Source Storage News Roundup Building Hierarchical Jails (Podman x Native Jail) on FreeBSD 15 FreeBSD 15.0 Linuxulator with CUDA Setup Bidirectional OPNsense/pfSense Firewall Configuration Migration/Conversion CLI SYN attack Syn attack follow up Netbase is Port of NetBSD Utilities to Another UNIX Like Operating Systems Beastie Bits OpenBSD -current moves to 7.9-beta - Delayed hibernation comes to OpenBSD/amd64 laptops Tarsnap This weeks episode of BSDNow was sponsored by our friends at Tarsnap, the only secure online backup you can trust your data to. Even paranoids need backups. Feedback/Questions Send questions, comments, show ideas/topics, or stories you want mentioned on the show to feedback@bsdnow.tv Join us and other BSD Fans in our BSD Now Telegram channel

The 7investing Podcast
Mar 9, 2026: Micron & Infleqtion - Two AI Stocks to Watch in 2026

The 7investing Podcast

Play Episode Listen Later Apr 2, 2026 34:05


Simon Erickson of 7investing breaks down two high-potential watchlist stocks: Micron Technology (NASDAQ:MU) — a memory giant riding the AI boom with explosive margin expansion — and Infleqtion (NYSE:INFQ), a newly public quantum computing company using groundbreaking neutral atom technology. Micron's high bandwidth memory (HBM) is completely sold out through 2026, with gross margins expanding 16 percentage points year-over-year on $13.6B in quarterly revenue. This is one of the most compelling AI infrastructure plays in the semiconductor space right now.Infleqtion just hit public markets via SPAC in February 2026 and is already partnered with NVIDIA (NASDAQ:NVDA) through a CUDA integration called Qlink. With quantum computing threatening RSA encryption and unlocking solutions classical computers can't touch, government contracts, defense spending, and research grants are flooding the space — justifying premium valuations for early-stage leaders.

This Week in Startups
$2.5B Chip Heist, The Future of American AI, and Purpose-Built Robots | This Week in AI Ep 6

This Week in Startups

Play Episode Listen Later Mar 25, 2026 75:24


This Week in AI sneak peak! If you enjoy the episode find us on Spotify, Apple podcasts and YouTube by looking up "This Week in AI" or by going to thisweekinai.aiThis week Jason sat down with Jake Loosararian and Chris Lattner on Episode 6 of This Week in AI. Jake is the CEO and co-founder of Gecko Robotics, a company deploying purpose-built robots and AI for mission-critical infrastructure inspection across energy, defense, and manufacturing. Chris is the CEO and co-founder of Modular, building a universal software layer that lets developers run AI models across Nvidia, AMD, and Apple silicon without being locked into any single hardware vendor.We explore the GPU shortage, why China's chip smuggling reveals the stakes of the AI cold war, how purpose-built robotics are beating humanoids on ROI, the case for American reindustrialization, and why the next decade could be the best ever for private equity in capital-intensive industries.Purpose-Built Robots vs. Humanoids: Jake has been building mission-critical robots for 13 years. He explains why general-purpose humanoids still have too little ROI for industrial use, and why specialized robots that find and fix problems are winning in the field.The GPU Shortage Is Real: Chris breaks down why you can't just go buy 100 Blackwell chips today, why Nvidia's Cuda creates massive lock-in, and how Modular is building a unified software layer across all major chip architectures.Google TPUs Are the Sleeper: Chris ranks Google as the number one threat to Nvidia's dominance, ahead of Amazon's Trainium and AMD.China's Chip Smuggling & the AI Cold War: A Supermicro co-founder allegedly smuggled $2.5B in Nvidia chips to China using fake serial numbers and a hairdryer. The Best Decade for Private Equity: Jake makes the case that capital-intensive, commoditized infrastructure assets: waste-to-energy, water treatment, old power plants will all generate incredible returns.Self-Driving State of Play: Chris, a former Tesla Autopilot lead, gives his read on Waymo's lead, Tesla's small Austin pilot, and why the real signal is when Tesla starts filing for fully autonomous permits in California.Learn more about Gecko Robotics: https://www.geckorobotics.comLearn more about Modular: https://www.modular.com/This Week In AI is made possible by:*PayPalOpen* - One Platform for all Business: paypalopen.com*Timestamps:*00:00 Welcome & intro to Jake Lu (Gecko Robotics) and Chris Lattner (Modular)01:34 Gecko's 13-year journey & the Cantilever platform05:15 Chris Lattner on Modular: replacing Cuda & unifying AI hardware11:10 Nvidia lock-in, AMD's Rock & why the software stack is broken19:49 The GPU shortage: how real is it?22:13 Who challenges Nvidia? Google TPUs, Amazon Trainium & AMD ranked28:17 China chip smuggling: $2.5B in Nvidia GPUs & the AI cold war37:43 Self-driving update: Waymo, Tesla's Austin pilot & Chris's Tesla history42:20 Figure's humanoid package sorting — real or demo magic?43:47 The best decade for private equity in capital-intensive assets51:04 Reindustrialization, the trades boom & making manufacturing cool58:39 Building tech companies outside Silicon Valley1:06:46 Breaking news: Brett Adcock launches Hark from Figure1:10:15 Closing thoughts: grit over hype, customers over valuationsSubscribe to This Week in AI on Apple: https://thisweekinai.ai/spotifySubscribe to This Week in AI on Spotify: https://thisweekinai.ai/appleThanks for watching!

Let's Talk AI
#237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

Let's Talk AI

Play Episode Listen Later Mar 16, 2026 147:19


Our 237th episode with a summary and discussion of last week's big AI news!Recorded on 03/13/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI's computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents.* ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs.* Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun's aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation.* Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:23) Response to listener commentsTools & Apps(00:02:06) Perplexity's Personal Computer turns your spare Mac into an AI agent | The Verge(00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch(00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch(00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch(00:11:56) Anthropic's Claude AI can respond with charts, diagrams, and other visuals now | The VergeProjects & Open Source(00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical BlogApplications & Business(00:21:22) Nvidia halts H200 production as China backs Huawei AI chips(00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider(00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar(00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch(00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunchPolicy & Safety(00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk' Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News(00:58:15) Endogenous Resistance to Activation Steering in Language Models(01:06:27) Reasoning Models Struggle to Control their Chains of Thought(01:09:52) ‘It means missile defence on datacentres': drone strikes raise doubts over Gulf as AI superpower(01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates(01:18:24) Frontier Models Can Take Actions at Low ProbabilitiesResearch & Advancements(01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main(01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining(01:40:09) Memory Caching: RNNs with Growing Memory(01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking(01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections(02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seekingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Mar 10, 2026 83:37


Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and World's Fair!The definitive AI Accelerator chip company has more than 10xed this AI Summer:And is now a $4.4 trillion megacorp… that is somehow still moving like a startup. We are blessed to have a unique relationship with our first ever NVIDIA guests: Kyle Kranen who gave a great inference keynote at the first World's Fair and is one of the leading architects of NVIDIA Dynamo (a Datacenter scale inference framework supporting SGLang, TRT-LLM, vLLM), and Nader Khalil, a friend of swyx from our days in Celo in The Arena, who has been drawing developers at GTC since before they were even a glimmer in the eye of NVIDIA:Nader discusses how NVIDIA Brev has drastically reduced the barriers to entry for developers to get a top of the line GPU up and running, and Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. We also dive into Jensen's “SOL” (Speed of Light) first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (https://build.nvidia.com), and upcoming Dynamo and agent sessions at GTC.Full Video pod on YouTubeTimestamps00:00 Agent Security Basics00:39 Podcast Welcome and Guests07:19 Acquisition and DevEx Shift13:48 SOL Culture and Dynamo Setup27:38 Why Scale Out Wins29:02 Scale Up Limits Explained30:24 From Laptop to Multi Node33:07 Cost Quality Latency Tradeoffs38:42 Disaggregation Prefill vs Decode41:05 Kubernetes Scaling with Grove43:20 Context Length and Co Design57:34 Security Meets Agents58:01 Agent Permissions Model59:10 Build Nvidia Inference Gateway01:01:52 Hackathons And Autonomy Dreams01:10:26 Local GPUs And Scaling Inference01:15:31 Long Running Agents And SF ReflectionsTranscriptAgent Security BasicsNader: Agents can do three things. They can access your files, they can access the internet, and then now they can write custom code and execute it. You literally only let an agent do two of those three things. If you can access your files and you can write custom code, you don't want internet access because that's one to see full vulnerability, right?If you have access to internet and your file system, you should know the full scope of what that agent's capable of doing. Otherwise, now we can get injected or something that can happen. And so that's a lot of what we've been thinking about is like, you know, how do we both enable this because it's clearly the future.But then also, you know, what, what are these enforcement points that we can start to like protect?swyx: All right.Podcast Welcome and Guestsswyx: Welcome to the Lean Space podcast in the Chromo studio. Welcome to all the guests here. Uh, we are back with our guest host Viu. Welcome. Good to have you back. And our friends, uh, Netter and Kyle from Nvidia. Welcome.Kyle: Yeah, thanks for having us.swyx: Yeah, thank you. Actually, I don't even know your titles.Uh, I know you're like architect something of Dynamo.Kyle: Yeah. I, I'm one of the engineering leaders [00:01:00] and a architects of Dynamo.swyx: And you're director of something and developers, developer tech.Nader: Yeah.swyx: You're the developers, developers, developers guy at nvidia,Nader: open source agent marketing, brev,swyx: and likeNader: Devrel tools and stuff.swyx: Yeah. BeenNader: the focus.swyx: And we're, we're kind of recording this ahead of Nvidia, GTC, which is coming to town, uh, again, uh, or taking over town, uh, which, uh, which we'll all be at. Um, and we'll talk a little bit about your sessions and stuff. Yeah.Nader: We're super excited for it.GTC Booth Stunt Storiesswyx: One of my favorite memories for Nader, like you always do like marketing stunts and like while you were at Rev, you like had this surfboard that you like, went down to GTC with and like, NA Nvidia apparently, like did so much that they bought you.Like what, what was that like? What was that?Nader: Yeah. Yeah, we, we, um. Our logo was a chaka. We, we, uh, we were always just kind of like trying to keep true to who we were. I think, you know, some stuff, startups, you're like trying to pretend that you're a bigger, more mature company than you are. And it was actually Evan Conrad from SF Compute who was just like, you guys are like previousswyx: guest.Yeah.Nader: Amazing. Oh, really? Amazing. Yeah. He was just like, guys, you're two dudes in the room. Why are you [00:02:00] pretending that you're not? Uh, and so then we were like, okay, let's make the logo a shaka. We brought surfboards to our booth to GTC and the energy was great. Yeah. Some palm trees too. They,Kyle: they actually poked out over like the, the walls so you could, you could see the bread booth.Oh, that's so funny. AndNader: no one else,Kyle: just from very far away.Nader: Oh, so you remember it backKyle: then? Yeah I remember it pre-acquisition. I was like, oh, those guys look cool,Nader: dude. That makes sense. ‘cause uh, we, so we signed up really last minute, and so we had the last booth. It was all the way in the corner. And so I was, I was worried that no one was gonna come.So that's why we had like the palm trees. We really came in with the surfboards. We even had one of our investors bring her dog and then she was just like walking the dog around to try to like, bring energy towards our booth. Yeah.swyx: Steph.Kyle: Yeah. Yeah, she's the best,swyx: you know, as a conference organizer, I love that.Right? Like, it's like everyone who sponsors a conference comes, does their booth. They're like, we are changing the future of ai or something, some generic b******t and like, no, like actually try to stand out, make it fun, right? And people still remember it after three years.Nader: Yeah. Yeah. You know what's so funny?I'll, I'll send, I'll give you this clip if you wanna, if you wanna add it [00:03:00] in, but, uh, my wife was at the time fiance, she was in medical school and she came to help us. ‘cause it was like a big moment for us. And so we, we bought this cricket, it's like a vinyl, like a vinyl, uh, printer. ‘cause like, how else are we gonna label the surfboard?So, we got a surfboard, luckily was able to purchase that on the company card. We got a cricket and it was just like fine tuning for enterprises or something like that, that we put on the. On the surfboard and it's 1:00 AM the day before we go to GTC. She's helping me put these like vinyl stickers on.And she goes, you son of, she's like, if you pull this off, you son of a b***h. And so, uh, right. Pretty much after the acquisition, I stitched that with the mag music acquisition. I sent it to our family group chat. Ohswyx: Yeah. No, well, she, she made a good choice there. Was that like basically the origin story for Launchable is that we, it was, and maybe we should explain what Brev is andNader: Yeah.Yeah. Uh, I mean, brev is just, it's a developer tool that makes it really easy to get a GPU. So we connect a bunch of different GPU sources. So the basics of it is like, how quickly can we SSH you into a G, into a GPU and whenever we would talk to users, they wanted A GPU. They wanted an A 100. And if you go to like any cloud [00:04:00] provisioning page, usually it's like three pages of forms or in the forms somewhere there's a dropdown.And in the dropdown there's some weird code that you know to translate to an A 100. And I remember just thinking like. Every time someone says they want an A 100, like the piece of text that they're telling me that they want is like, stuffed away in the corner. Yeah. And so we were like, what if the biggest piece of text was what the user's asking for?And so when you go to Brev, it's just big GPU chips with the type that you want withswyx: beautiful animations that you worked on pre, like pre you can, like, now you can just prompt it. But back in the day. Yeah. Yeah. Those were handcraft, handcrafted artisanal code.Nader: Yeah. I was actually really proud of that because, uh, it was an, i I made it in Figma.Yeah. And then I found, I was like really struggling to figure out how to turn it from like Figma to react. So what it actually is, is just an SVG and I, I have all the styles and so when you change the chip, whether it's like active or not it changes the SVG code and that somehow like renders like, looks like it's animating, but it, we just had the transition slow, but it's just like the, a JavaScript function to change the like underlying SVG.Yeah. And that was how I ended up like figuring out how to move it from from Figma. But yeah, that's Art Artisan. [00:05:00]Kyle: Speaking of marketing stunts though, he actually used those SVGs. Or kind of use those SVGs to make these cards.Nader: Oh yeah. LikeKyle: a GPU gift card Yes. That he handed out everywhere. That was actually my first impression of thatNader: one.Yeah,swyx: yeah, yeah.Nader: Yeah.swyx: I think I still have one of them.Nader: They look great.Kyle: Yeah.Nader: I have a ton of them still actually in our garage, which just, they don't have labels. We should honestly like bring, bring them back. But, um, I found this old printing press here, actually just around the corner on Ven ness. And it's a third generation San Francisco shop.And so I come in an excited startup founder trying to like, and they just have this crazy old machinery and I'm in awe. ‘cause the the whole building is so physical. Like you're seeing these machines, they have like pedals to like move these saws and whatever. I don't know what this machinery is, but I saw all three generations.Like there's like the grandpa, the father and the son, and the son was like, around my age. Well,swyx: it's like a holy, holy trinity.Nader: It's funny because we, so I just took the same SVG and we just like printed it and it's foil printing, so they make a a, a mold. That's like an inverse of like the A 100 and then they put the foil on it [00:06:00] and then they press it into the paper.And I remember once we got them, he was like, Hey, don't forget about us. You know, I guess like early Apple and Cisco's first business cards were all made there. And so he was like, yeah, we, we get like the startup businesses but then as they mature, they kind of go somewhere else. And so I actually, I think we were talking with marketing about like using them for some, we should go back and make some cards.swyx: Yeah, yeah, yeah. You know, I remember, you know, as a very, very small breadth investor, I was like, why are we spending time like, doing these like stunts for GPUs? Like, you know, I think like as a, you know, typical like cloud hard hardware person, you go into an AWS you pick like T five X xl, whatever, and it's just like from a list and you look at the specs like, why animate this GP?And, and I, I do think like it just shows the level of care that goes throughout birth and Yeah. And now, and also the, and,Nader: and Nvidia. I think that's what the, the thing that struck me most when we first came in was like the amount of passion that everyone has. Like, I think, um, you know, you talk to, you talk to Kyle, you talk to, like, every VP that I've met at Nvidia goes so close to the metal.Like, I remember it was almost a year ago, and like my VP asked me, he's like, Hey, [00:07:00] what's cursor? And like, are you using it? And if so, why? Surprised at this, and he downloaded Cursor and he was asking me to help him like, use it. And I thought that was, uh, or like, just show him what he, you know, why we were using it.And so, the amount of care that I think everyone has and the passion, appreciate, passion and appreciation for the moment. Right. This is a very unique time. So it's really cool to see everyone really like, uh, appreciate that.swyx: Yeah.Acquisition and DevEx Shiftswyx: One thing I wanted to do before we move over to sort of like research topics and, uh, the, the stuff that Kyle's working on is just tell the story of the acquisition, right?Like, not many people have been, been through an acquisition with Nvidia. What's it like? Uh, what, yeah, just anything you'd like to say.Nader: It's a crazy experience. I think, uh, you know, we were the thing that was the most exciting for us was. Our goal was just to make it easier for developers.We wanted to find access to GPUs, make it easier to do that. And then all, oh, actually your question about launchable. So launchable was just make one click exper, like one click deploys for any software on top of the GPU. Mm-hmm. And so what we really liked about Nvidia was that it felt like we just got a lot more resources to do all of that.I think, uh, you [00:08:00] know, NVIDIA's goal is to make things as easy for developers as possible. So there was a really nice like synergy there. I think that, you know, when it comes to like an acquisition, I think the amount that the soul of the products align, I think is gonna be. Is going speak to the success of the acquisition.Yeah. And so it in many ways feels like we're home. This is a really great outcome for us. Like we you know, I love brev.nvidia.com. Like you should, you should use it's, it's theKyle: front page for GPUs.Nader: Yeah. Yeah. If you want GP views,Kyle: you go there, getswyx: it there, and it's like internally is growing very quickly.I, I don't remember You said some stats there.Nader: Yeah, yeah, yeah. It's, uh, I, I wish I had the exact numbers, but like internally, externally, it's been growing really quickly. We've been working with a bunch of partners with a bunch of different customers and ISVs, if you have a solution that you want someone that runs on the GPU and you want people to use it quickly, we can bundle it up, uh, in a launchable and make it a one click run.If you're doing things and you want just like a sandbox or something to run on, right. Like open claw. Huge moment. Super exciting. Our, uh, and we'll talk into it more, but. You know, internally, people wanna run this, and you, we know we have to be really careful from the security implications. Do we let this run on the corporate network?Security's guidance was, Hey, [00:09:00] run this on breath, it's in, you know, it's, it's, it's a vm, it's sitting in the cloud, it's off the corporate network. It's isolated. And so that's been our stance internally and externally about how to even run something like open call while we figure out how to run these things securely.But yeah,swyx: I think there's also like, you almost like we're the right team at the right time when Nvidia is starting to invest a lot more in developer experience or whatever you call it. Yeah. Uh, UX or I don't know what you call it, like software. Like obviously NVIDIA is always invested in software, but like, there's like, this is like a different audience.Yeah. It's aNader: widerKyle: developer base.swyx: Yeah. Right.Nader: Yeah. Yeah. You know, it's funny, it's like, it's not, uh,swyx: so like, what, what is it called internally? What, what is this that people should be aware that is going on there?Nader: Uh, what, like developer experienceswyx: or, yeah, yeah. Is it's called just developer experience or is there like a broader strategy hereNader: in Nvidia?Um, Nvidia always wants to make a good developer experience. The thing is and a lot of the technology is just really complicated. Like, it's not, it's uh, you know, I think, um. The thing that's been really growing or the AI's growing is having a huge moment, not [00:10:00] because like, let's say data scientists in 2018, were quiet then and are much louder now.The pie is com, right? There's a whole bunch of new audiences. My mom's wondering what she's doing. My sister's learned, like taught herself how to code. Like the, um, you know, I, I actually think just generally AI's a big equalizer and you're seeing a more like technologically literate society, I guess.Like everyone's, everyone's learning how to code. Uh, there isn't really an excuse for that. And so building a good UX means that you really understand who your end user is. And when your end user becomes such a wide, uh, variety of people, then you have to almost like reinvent the practice, right? Yeah. You haveKyle: to, and actually build more developer ux, right?Because the, there are tiers of developer base that were added. You know, the, the hackers that are building on top of open claw, right? For example, have never used gpu. They don't know what kuda is. They, they, they just want to run something.Nader: Yeah.Kyle: You need new UX that is not just. Hey, you know, how do you program something in Cuda and run it?And then, and then we built, you know, like when Deep Learning was getting big, we built, we built Torch and, and, but so recently the amount of like [00:11:00] layers that are added to that developer stack has just exploded because AI has become ubiquitous. Everyone's using it in different ways. Yeah. It'sNader: moving fast in every direction.Vertical, horizontal.Vibhu: Yeah. You guys, you even take it down to hardware, like the DGX Spark, you know, it's, it's basically the same system as just throwing it up on big GPU cluster.Nader: Yeah, yeah, yeah. It's amazing. Blackwell.swyx: Yeah. Uh, we saw the preview at the last year's GTC and that was one of the better performing, uh, videos so far, and video coverage so far.Awesome. This will beat it. Um,Nader: that wasswyx: actually, we have fingersNader: crossed. Yeah.DGX Spark and Remote AccessNader: Even when Grace Blackwell or when, um, uh, DGX Spark was first coming out getting to be involved in that from the beginning of the developer experience. And it just comes back to what youswyx: were involved.Nader: Yeah. St. St.swyx: Mars.Nader: Yeah. Yeah. I mean from, it was just like, I, I got an email, we just got thrown into the loop and suddenly yeah, I, it was actually really funny ‘cause I'm still pretty fresh from the acquisition and I'm, I'm getting an email from a bunch of the engineering VPs about like, the new hardware, GPU chip, like we're, or not chip, but just GPU system that we're putting out.And I'm like, okay, cool. Matters. Now involved with this for the ux, I'm like. What am I gonna do [00:12:00] here? So, I remember the first meeting, I was just like kind of quiet as I was hearing engineering VPs talk about what this box could be, what it could do, how we should use it. And I remember, uh, one of the first ideas that people were idea was like, oh, the first thing that it was like, I think a quote was like, the first thing someone's gonna wanna do with this is get two of them and run a Kubernetes cluster on top of them.And I was like, oh, I think I know why I'm here. I was like, the first thing we're doing is easy. SSH into the machine. And then, and you know, just kind of like scoping it down of like, once you can do that every, you, like the person who wants to run a Kubernetes cluster onto Sparks has a higher propensity for pain, then, then you know someone who buys it and wants to run open Claw right now, right?If you can make sure that that's as effortless as possible, then the rest becomes easy. So there's a tool called Nvidia Sync. It just makes the SSH connection really simple. So, you know, if you think about it like. If you have a Mac, uh, or a PC or whatever, if you have a laptop and you buy this GPU and you want to use it, you should be able to use it like it's A-A-G-P-U in the cloud, right?Um, but there's all this friction of like, how do you actually get into that? That's part of [00:13:00] Revs value proposition is just, you know, there's a CLI that wraps SSH and makes it simple. And so our goal is just get you into that machine really easily. And one thing we just launched at CES, it's in, it's still in like early access.We're ironing out some kinks, but it should be ready by GTC. You can register your spark on Brev. And so now if youswyx: like remote managed yeah, local hardware. Single pane of glass. Yeah. Yeah. Because Brev can already manage other clouds anyway, right?Vibhu: Yeah, yeah. And you use the spark on Brev as well, right?Nader: Yeah. But yeah, exactly. So, so you, you, so you, you set it up at home you can run the command on it, and then it gets it's essentially it'll appear in your Brev account, and then you can take your laptop to a Starbucks or to a cafe, and you'll continue to use your, you can continue use your spark just like any other cloud node on Brev.Yeah. Yeah. And it's just like a pre-provisioned centerswyx: in yourNader: home. Yeah, exactly.swyx: Yeah. Yeah.Vibhu: Tiny little data center.Nader: Tiny little, the size ofVibhu: your phone.SOL Culture and Dynamo Setupswyx: One more thing before we move on to Kyle. Just have so many Jensen stories and I just love, love mining Jensen stories. Uh, my favorite so far is SOL. Uh, what is, yeah, what is S-O-L-S-O-LNader: is actually, i, I think [00:14:00] of all the lessons I've learned, that one's definitely my favorite.Kyle: It'll always stick with you.Nader: Yeah. Yeah. I, you know, in your startup, everything's existential, right? Like we've, we've run out of money. We were like, on the risk of, of losing payroll, we've had to contract our team because we l ran outta money. And so like, um, because of that you're really always forcing yourself to I to like understand the root cause of everything.If you get a date, if you get a timeline, you know exactly why that date or timeline is there. You're, you're pushing every boundary and like, you're not just say, you're not just accepting like a, a no. Just because. And so as you start to introduce more layers, as you start to become a much larger organization, SOL is is essentially like what is the physics, right?The speed of light moves at a certain speed. So if flight's moving some slower, then you know something's in the way. So before trying to like layer reality back in of like, why can't this be delivered at some date? Let's just understand the physics. What is the theoretical limit to like, uh, how fast this can go?And then start to tell me why. ‘cause otherwise people will start telling you why something can't be done. But actually I think any great leader's goal is just to create urgency. Yeah. [00:15:00] There's an infiniteKyle: create compelling events, right?Nader: Yeah.Kyle: Yeah. So l is a term video is used to instigate a compelling event.You say this is done. How do we get there? What is the minimum? As much as necessary, as little as possible thing that it takes for us to get exactly here and. It helps you just break through a bunch of noise.swyx: Yeah.Kyle: Instantly.swyx: One thing I'm unclear about is, can only Jensen use the SOL card? Like, oh, no, no, no.Not everyone get the b******t out because obviously it's Jensen, but like, can someone else be like, no, likeKyle: frontline engineers use it.Nader: Yeah. Every, I think it's not so much about like, get the b******t out. It's like, it's like, give me the root understanding, right? Like, if you tell me something takes three weeks, it like, well, what's the first principles?Yeah, the first principles. It's like, what's the, what? Like why is it three weeks? What is the actual yeah. What's the actual limit of why this is gonna take three weeks? If you're gonna, if you, if let's say you wanted to buy a new computer and someone told you it's gonna be here in five days, what's the SOL?Well, like the SOL is like, I could walk into a Best Buy and pick it up for you. Right? So then anything that's like beyond that is, and is that practical? Is that how we're gonna, you know, let's say give everyone in the [00:16:00] company a laptop, like obviously not. So then like that's the SOL and then it's like, okay, well if we have to get more than 10, suddenly there might be some, right?And so now we can kind of piece the reality back.swyx: So, so this is the. Paul Graham do things that don't scale. Yeah. And this is also the, what people would now call behi agency. Yeah.Kyle: It's actually really interesting because there's a, there's a second hardware angle to SOL that like doesn't come up for all the org sol is used like culturally at aswyx: media for everything.I'm also mining for like, I think that can be annoying sometimes. And like someone keeps going IOO you and you're like, guys, like we have to be stable. We have to, we to f*****g plan. Yeah.Kyle: It's an interesting balance.Nader: Yeah. I encounter that with like, actually just with, with Alec, right? ‘cause we, we have a new conference so we need to launch, we have, we have goals of what we wanna launch by, uh, by the conference and like, yeah.At the end of the day, where isswyx: this GTC?Nader: Um, well this is like, so we, I mean we did it for CES, we did for GT CDC before that we're doing it for GTC San Jose. So I mean, like every, you know, we have a new moment. Um, and we want to launch something. Yeah. And we want to do so at SOL and that does mean that some, there's some level of prioritization that needs [00:17:00] to happen.And so it, it is difficult, right? I think, um, you have to be careful with what you're pushing. You know, stability is important and that should be factored into S-O-L-S-O-L isn't just like, build everything and let it break, you know, that, that's part of the conversation. So as you're laying, layering in all the details, one of them might be, Hey, we could build this, but then it's not gonna be stable for X, y, z reasons.And so that was like, one of our conversations for CES was, you know, hey, like we, we can get this into early access registering your spark with brev. But there are a lot of things that we need to do in order to feel really comfortable from a security perspective, right? There's a lot of networking involved before we deliver that to users.So it's like, okay. Let's get this to a point where we can at least let people experiment with it. We had it in a booth, we had it in Jensen's keynote, and then let's go iron out all the networking kinks. And that's not easy. And so, uh, that can come later. And so that was the way that we layered that back in.Yeah. ButKyle: It's not really about saying like, you don't have to do the, the maintenance or operational work. It's more about saying, you know, it's kind of like [00:18:00] highlights how progress is incremental, right? Like, what is the minimum thing that we can get to. And then there's SOL for like every component after that.But there's the SOL to get you, get you to the, the starting line. And that, that's usually how it's asked. Yeah. On the other side, you know, like SOL came out of like hardware at Nvidia. Right. So SOL is like literally if we ran the accelerator or the GPU with like at basically full speed with like no other constraints, like how FAST would be able to make a program go.swyx: Yeah. Yeah. Right.Kyle: Soswyx: in, in training that like, you know, then you work back to like some percentage of like MFU for example.Kyle: Yeah, that's a, that's a great example. So like, there's an, there's an S-O-L-M-F-U, and then there's like, you know, what's practically achievable.swyx: Cool. Should we move on to sort of, uh, Kyle's side?Uh, Kyle, you're coming more from the data science world. And, uh, I, I mean I always, whenever, whenever I meet someone who's done working in tabular stuff, graph neural networks, time series, these are basically when I go to new reps, I go to ICML, I walk the back halls. There's always like a small group of graph people.Yes. Absolute small group of tabular people. [00:19:00] And like, there's no one there. And like, it's very like, you know what I mean? Like, yeah, no, like it's, it's important interesting work if you care about solving the problems that they solve.Kyle: Yeah.swyx: But everyone else is just LMS all the time.Kyle: Yeah. I mean it's like, it's like the black hole, right?Has the event horizon reached this yet in nerves? Um,swyx: but like, you know, those are, those are transformers too. Yeah. And, and those are also like interesting things. Anyway, uh, I just wanted to spend a little bit of time on, on those, that background before we go into Dynamo, uh, proper.Kyle: Yeah, sure. I took a different path to Nvidia than that, or I joined six years ago, seven, if you count, when I was an intern.So I joined Nvidia, like right outta college. And the first thing I jumped into was not what I'd done in, during internship, which was like, you know, like some stuff for autonomous vehicles, like heavyweight object detection. I jumped into like, you know, something, I'm like, recommenders, this is popular. Andswyx: yeah, he did RexiKyle: as well.Yeah, Rexi. Yeah. I mean that, that was the taboo data at the time, right? You have tables of like, audience qualities and item qualities, and you're trying to figure out like which member of [00:20:00] the audience matches which item or, or more practically which item matches which member of the audience. And at the time, really it was like we were trying to enable.Uh, recommender, which had historically been like a little bit of a CP based workflow into something that like, ran really well in GPUs. And it's since been done. Like there are a bunch of libraries for Axis that run on GPUs. Uh, the common models like Deeplearning recommendation model, which came outta meta and the wide and deep model, which was used or was released by Google were very accelerated by GPUs using, you know, the fast HBM on the chips, especially to do, you know, vector lookups.But it was very interesting at the time and super, super relevant because like we were starting to get like. This explosion of feeds and things that required rec recommenders to just actively be on all the time. And sort of transitioned that a little bit towards graph neural networks when I discovered them because I was like, okay, you can actually use graphical neural networks to represent like, relationships between people, items, concepts, and that, that interested me.So I jumped into that at [00:21:00] Nvidia and, and got really involved for like two-ish years.swyx: Yeah. Uh, and something I learned from Brian Zaro Yeah. Is that you can just kind of choose your own path in Nvidia.Kyle: Oh my God. Yeah.swyx: Which is not a normal big Corp thing. Yeah. Like you, you have a lane, you stay in your lane.Nader: I think probably the reason why I enjoy being in a, a big company, the mission is the boss probably from a startup guy. Yeah. The missionswyx: is the boss.Nader: Yeah. Uh, it feels like a big game of pickup basketball. Like, you know, if you play one, if you wanna play basketball, you just go up to the court and you're like, Hey look, we're gonna play this game and we need three.Yeah. And you just like find your three. That's honestly for every new initiative that's what it feels like. Yeah.Vibhu: It also like shows, right? Like Nvidia. Just releasing state-of-the-art stuff in every domain. Yeah. Like, okay, you expect foundation models with Nemo tron voice just randomly parakeet.Call parakeet just comes out another one, uh, voice. TheKyle: video voice team has always been producing.Vibhu: Yeah. There's always just every other domain of paper that comes out, dataset that comes out. It's like, I mean, it also stems back to what Nvidia has to do, right? You have to make chips years before they're actually produced.Right? So you need to know, you need to really [00:22:00] focus. TheKyle: design process starts likeVibhu: exactlyKyle: three to five years before the chip gets to the market.Vibhu: Yeah. I, I'm curious more about what that's like, right? So like, you have specialist teams. Is it just like, you know, people find an interest, you go in, you go deep on whatever, and that kind of feeds back into, you know, okay, we, we expect predictions.Like the internals at Nvidia must be crazy. Right? You know? Yeah. Yeah. You know, you, you must. Not even without selling to people, you have your own predictions of where things are going. Yeah. And they're very based, very grounded. Right?Kyle: Yeah. It, it, it's really interesting. So there's like two things that I think that Amed does, which are quite interesting.Uh, one is like, we really index into passion. There's a big. Sort of organizational top sound push to like ensure that people are working on the things that they're passionate about. So if someone proposes something that's interesting, many times they can just email someone like way up the chain that they would find this relevant and say like, Hey, can I go work on this?Nader: It's actually like I worked at a, a big company for a couple years before, uh, starting on my startup journey and like, it felt very weird if you were to like email out of chain, if that makes [00:23:00] sense. Yeah. The emails at Nvidia are like mosh pitsswyx: shoot,Nader: and it's just like 60 people, just whatever. And like they're, there's this,swyx: they got messy like, reply all you,Nader: oh, it's in, it's insane.It's insane. They justKyle: help. You know, Maxim,Nader: the context. But, but that's actually like, I've actually, so this is a weird thing where I used to be like, why would we send emails? We have Slack. I am the entire, I'm the exact opposite. I feel so bad for anyone who's like messaging me on Slack ‘cause I'm so unresponsive.swyx: Your emailNader: Maxi, email Maxim. I'm email maxing Now email is a different, email is perfect because man, we can't work together. I'm email is great, right? Because important threads get bumped back up, right? Yeah, yeah. Um, and so Slack doesn't do that. So I just have like this casino going off on the right or on the left and like, I don't know which thread was from where or what, but like the threads get And then also just like the subject, so you can have like working threads.I think what's difficult is like when you're small, if you're just not 40,000 people I think Slack will work fine, but there's, I don't know what the inflection point is. There is gonna be a point where that becomes really messy and you'll actually prefer having email. ‘cause you can have working threads.You can cc more than nine people in a thread.Kyle: You can fork stuff.Nader: You can [00:24:00] fork stuff, which is super nice and just like y Yeah. And so, but that is part of where you can propose a plan. You can also just. Start, honestly, momentum's the only authority, right? So like, if you can just start, start to make a little bit of progress and show someone something, and then they can try it.That's, I think what's been, you know, I think the most effective way to push anything for forward. And that's both at Nvidia and I think just generally.Kyle: Yeah, there's, there's the other concept that like is explored a lot at Nvidia, which is this idea of a zero billion dollar business. Like market creation is a big thing at Nvidia.Like,swyx: oh, you want to go and start a zero billion dollar business?Kyle: Jensen says, we are completely happy investing in zero billion dollar markets. We don't care if this creates revenue. It's important for us to know about this market. We think it will be important in the future. It can be zero billion dollars for a while.I'm probably minging as words here for, but like, you know, like, I'll give an example. NVIDIA's been working on autonomous driving for a a long time,swyx: like an Nvidia car.Kyle: No, they, they'veVibhu: used the Mercedes, right? They're around the HQ and I think it finally just got licensed out. Now they're starting to be used quite a [00:25:00] bit.For 10 years you've been seeing Mercedes with Nvidia logos driving.Kyle: If you're in like the South San Santa Clara, it's, it's actually from South. Yeah. So, um. Zero billion dollar markets are, are a thing like, you know, Jensen,swyx: I mean, okay, look, cars are not a zero billion dollar market. But yeah, that's a bad example.Nader: I think, I think he's, he's messaging, uh, zero today, but, or even like internally, right? Like, like it's like, uh, an org doesn't have to ruthlessly find revenue very quickly to justify their existence. Right. Like a lot of the important research, a lot of the important technology being developed that, that's kind ofKyle: where research, research is very ide ideologically free at Nvidia.Yeah. Like they can pursue things that they wereswyx: Were you research officially?Kyle: I was never in research. Officially. I was always in engineering. Yeah. We in, I'm in an org called Deep Warning Algorithms, which is basically just how do we make things that are relevant to deep warning go fast.swyx: That sounds freaking cool.Vibhu: And I think a lot of that is underappreciated, right? Like time series. This week Google put out time. FF paper. Yeah. A new time series, paper res. Uh, Symantec, ID [00:26:00] started applying Transformers LMS to Yes. Rec system. Yes. And when you think the scale of companies deploying these right. Amazon recommendations, Google web search, it's like, it's huge scale andKyle: Yeah.Vibhu: You want fast?Kyle: Yeah. Yeah. Yeah. Actually it's, it, I, there's a fun moment that brought me like full circle. Like, uh, Amazon Ads recently gave a talk where they talked about using Dynamo for generative recommendation, which was like super, like weirdly cathartic for me. I'm like, oh my God. I've, I've supplanted what I was working on.Like, I, you're using LMS now to do what I was doing five years ago.swyx: Yeah. Amazing. And let's go right into Dynamo. Uh, maybe introduce Yeah, sure. To the top down and Yeah.Kyle: I think at this point a lot of people are familiar with the term of inference. Like funnily enough, like I went from, you know, inference being like a really niche topic to being something that's like discussed on like normal people's Twitter feeds.It's,Nader: it's on billboardsKyle: here now. Yeah. Very, very strange. Driving, driving, seeing just an inference ad on 1 0 1 inference at scale is becoming a lot more important. Uh, we have these moments like, you know, open claw where you have these [00:27:00] agents that take lots and lots of tokens, but produce, incredible results.There are many different aspects of test time scaling so that, you know, you can use more inference to generate a better result than if you were to use like a short amount of inference. There's reasoning, there's quiring, there's, adding agency to the model, allowing it to call tools and use skills.Dyno sort came about at Nvidia. Because myself and a couple others were, were sort of talking about the, these concepts that like, you know, you have inference engines like VLMS, shelan, tenor, TLM and they have like one single copy. They, they, they sort of think about like things as like one single copy, like one replica, right?Why Scale Out WinsKyle: Like one version of the model. But when you're actually serving things at scale, you can't just scale up that replica because you end up with like performance problems. There's a scaling limit to scaling up replicas. So you actually have to scale out to use a, maybe some Kubernetes type terminology.We kind of realized that there was like. A lot of potential optimization that we could do in scaling out and building systems for data [00:28:00] center scale inference. So Dynamo is this data center scale inference engine that sits on top of the frameworks like VLM Shilling and 10 T lm and just makes things go faster because you can leverage the economy of scale.The fact that you have KV cash, which we can define a little bit later, uh, in all these machines that is like unique and you wanna figure out like the ways to maximize your cash hits or you want to employ new techniques in inference like disaggregation, which Dynamo had introduced to the world in, in, in March, not introduced, it was a academic talk, but beforehand.But we are, you know, one of the first frameworks to start, supporting it. And we wanna like, sort of combine all these techniques into sort of a modular framework that allows you to. Accelerate your inference at scale.Nader: By the way, Kyle and I became friends on my first date, Nvidia, and I always loved, ‘cause like he always teaches meswyx: new things.Yeah. By the way, this is why I wanted to put two of you together. I was like, yeah, this is, this is gonna beKyle: good. It's very, it's very different, you know, like we've, we, we've, we've talked to each other a bunch [00:29:00] actually, you asked like, why, why can't we scale up?Nader: Yeah.Scale Up Limits ExplainedNader: model, you said model replicas.Kyle: Yeah. So you, so scale up means assigning moreswyx: heavier?Kyle: Yeah, heavier. Like making things heavier. Yeah, adding more GPUs. Adding more CPUs. Scale out is just like having a barrier saying, I'm gonna duplicate my representation of the model or a representation of this microservice or something, and I'm gonna like, replicate it Many times.Handle, load. And the reason that you can't scale, scale up, uh, past some points is like, you know, there, there, there are sort of hardware bounds and algorithmic bounds on, on that type of scaling. So I'll give you a good example that's like very trivial. Let's say you're on an H 100. The Maxim ENV link domain for H 100, for most Ds H one hundreds is heus, right?So if you scaled up past that, you're gonna have to figure out ways to handle the fact that now for the GPUs to communicate, you have to do it over Infin band, which is still very fast, but is not as fast as ENV link.swyx: Is it like one order of magnitude, like hundreds or,Kyle: it's about an order of magnitude?Yeah. Okay. Um, soswyx: not terrible.Kyle: [00:30:00] Yeah. I, I need to, I need to remember the, the data sheet here, like, I think it's like about 500 gigabytes. Uh, a second unidirectional for ENV link, and about 50 gigabytes a second unidirectional for Infin Band. I, it, it depends on the, the generation.swyx: I just wanna set this up for people who are not familiar with these kinds of like layers and the trash speedVibhu: and all that.Of course.From Laptop to Multi NodeVibhu: Also, maybe even just going like a few steps back before that, like most people are very familiar with. You see a, you know, you can use on your laptop, whatever these steel viol, lm you can just run inference there. All, there's all, you can, youcan run it on thatVibhu: laptop. You can run on laptop.Then you get to, okay, uh, models got pretty big, right? JLM five, they doubled the size, so mm-hmm. Uh, what do you do when you have to go from, okay, I can get 128 gigs of memory. I can run it on a spark. Then you have to go multi GPU. Yeah. Okay. Multi GPU, there's some support there. Now, if I'm a company and I don't have like.I'm not hiring the best researchers for this. Right. But I need to go [00:31:00] multi-node, right? I have a lot of servers. Okay, now there's efficiency problems, right? You can have multiple eight H 100 nodes, but, you know, is that as a, like, how do you do that efficiently?Kyle: Yeah. How do you like represent them? How do you choose how to represent the model?Yeah, exactly right. That's a, that's like a hard question. Everyone asks, how do you size oh, I wanna run GLM five, which just came out new model. There have been like four of them in the past week, by the way, like a bunch of new models.swyx: You know why? Right? Deep seek.Kyle: No comment. Oh. Yeah, but Ggl, LM five, right?We, we have this, new model. It's, it's like a large size, and you have to figure out how to both scale up and scale out, right? Because you have to find the right representation that you care about. Everyone does this differently. Let's be very clear. Everyone figures this out in their own path.Nader: I feel like a lot of AI or ML even is like, is like this. I think people think, you know, I, I was, there was some tweet a few months ago that was like, why hasn't fine tuning as a service taken off? You know, that might be me. It might have been you. Yeah. But people want it to be such an easy recipe to follow.But even like if you look at an ML model and specificKyle: to you Yeah,Nader: yeah.Kyle: And the [00:32:00] model,Nader: the situation, and there's just so much tinkering, right? Like when you see a model that has however many experts in the ME model, it's like, why that many experts? I don't, they, you know, they tried a bunch of things and that one seemed to do better.I think when it comes to how you're serving inference, you know, you have a bunch of decisions to make and there you can always argue that you can take something and make it more optimal. But I think it's this internal calibration and appetite for continued calibration.Vibhu: Yeah. And that doesn't mean like, you know, people aren't taking a shot at this, like tinker from thinking machines, you know?Yeah. RL as a service. Yeah, totally. It's, it also gets even harder when you try to do big model training, right? We're not the best at training Moes, uh, when they're pre-trained. Like we saw this with LAMA three, right? They're trained in such a sparse way that meta knows there's gonna be a bunch of inference done on these, right?They'll open source it, but it's very trained for what meta infrastructure wants, right? They wanna, they wanna inference it a lot. Now the question to basically think about is, okay, say you wanna serve a chat application, a coding copilot, right? You're doing a layer of rl, you're serving a model for X amount of people.Is it a chat model, a coding model? Dynamo, you know, back to that,Kyle: it's [00:33:00] like, yeah, sorry. So you we, we sort of like jumped off of, you know, jumped, uh, on that topic. Everyone has like, their own, own journey.Cost Quality Latency TradeoffsKyle: And I, I like to think of it as defined by like, what is the model you need? What is the accuracy you need?Actually I talked to NA about this earlier. There's three axes you care about. What is the quality that you're able to produce? So like, are you accurate enough or can you complete the task with enough, performance, high enough performance. Yeah, yeah. Uh, there's cost. Can you serve the model or serve your workflow?Because it's not just the model anymore, it's the workflow. It's the multi turn with an agent cheaply enough. And then can you serve it fast enough? And we're seeing all three of these, like, play out, like we saw, we saw new models from OpenAI that you know, are faster. You have like these new fast versions of models.You can change the amount of thinking to change the amount of quality, right? Produce more tokens, but at a higher cost in a, in a higher latency. And really like when you start this journey of like trying to figure out how you wanna host a model, you, you, you think about three things. What is the model I need to serve?How many times do I need to call it? What is the input sequence link was [00:34:00] the, what does the workflow look like on top of it? What is the SLA, what is the latency SLA that I need to achieve? Because there's usually some, this is usually like a constant, you, you know, the SLA that you need to hit and then like you try and find the lowest cost version that hits all of these constraints.Usually, you know, you, you start with those things and you say you, you kind of do like a bit of experimentation across some common configurations. You change the tensor parallel size, which is a form of parallelismVibhu: I take, it goes even deeper first. Gotta think what model.Kyle: Yes, course,ofKyle: course. It's like, it's like a multi-step design process because as you said, you can, you can choose a smaller model and then do more test time scaling and it'll equate the quality of a larger model because you're doing the test time scaling or you're adding a harness or something.So yes, it, it goes way deeper than that. But from the performance perspective, like once you get to the model you need, you need to host, you look at that and you say, Hey. I have this model, I need to serve it at the speed. What is the right configuration for that?Nader: You guys see the recent, uh, there was a paper I just saw like a few days ago that, uh, if you run [00:35:00] the same prompt twice, you're getting like double Just try itagain.Nader: Yeah, exactly.Vibhu: And you get a lot. Yeah. But the, the key thing there is you give the context of the failed try, right? Yeah. So it takes a shot. And this has been like, you know, basic guidance for quite a while. Just try again. ‘cause you know, trying, just try again. Did you try again? All adviceNader: in life.Vibhu: Just, it's a paper from Google, if I'm not mistaken, right?Yeah,Vibhu: yeah. I think it, it's like a seven bas little short paper. Yeah. Yeah. The title's very cute. And it's just like, yeah, just try again. Give it ask context,Kyle: multi-shot. You just like, say like, hey, like, you know, like take, take a little bit more, take a little bit more information, try and fail. Fail.Vibhu: And that basic concept has gone pretty deep.There's like, um, self distillation, rl where you, you do self distillation, you do rl and you have past failure and you know, that gives some signal so people take, try it again. Not strong enough.swyx: Uh, for, for listeners, uh, who listen to here, uh, vivo actually, and I, and we run a second YouTube channel for our paper club where, oh, that's awesome.Vivo just covered this. Yeah. Awesome. Self desolation and all that's, that's why he, to speed [00:36:00] on it.Nader: I'll to check it out.swyx: Yeah. It, it's just a good practice, like everyone needs, like a paper club where like you just read papers together and the social pressure just kind of forces you to just,Nader: we, we,there'sNader: like a big inference.Kyle: ReadingNader: group at a video. I feel so bad every time. I I, he put it on like, on our, he shared it.swyx: One, one ofNader: your guys,swyx: uh, is, is big in that, I forget es han Yeah, yeah,Kyle: es Han's on my team. Actually. Funny. There's a, there's a, there's a employee transfer between us. Han worked for Nater at Brev, and now he, he's on my team.He wasNader: our head of ai. And then, yeah, once we got in, andswyx: because I'm always looking for like, okay, can, can I start at another podcast that only does that thing? Yeah. And, uh, Esan was like, I was trying to like nudge Esan into like, is there something here? I mean, I don't think there's, there's new infant techniques every day.So it's like, it's likeKyle: you would, you would actually be surprised, um, the amount of blog posts you see. And ifswyx: there's a period where it was like, Medusa hydra, what Eagle, like, youKyle: know, now we have new forms of decode, uh, we have new forms of specula, of decoding or new,swyx: what,Kyle: what are youVibhu: excited? And it's exciting when you guys put out something like Tron.‘cause I remember the paper on this Tron three, [00:37:00] uh, the amount of like post train, the on tokens that the GPU rich can just train on. And it, it was a hybrid state space model, right? Yeah.Kyle: It's co-designed for the hardware.Vibhu: Yeah, go design for the hardware. And one of the things was always, you know, the state space models don't scale as well when you do a conversion or whatever the performance.And you guys are like, no, just keep draining. And Nitron shows a lot of that. Yeah.Nader: Also, something cool about Nitron it was released in layers, if you will, very similar to Dynamo. It's, it's, it's essentially it was released as you can, the pre-training, post-training data sets are released. Yeah. The recipes on how to do it are released.The model itself is released. It's full model. You just benefit from us turning on the GPUs. But there are companies like, uh, ServiceNow took the dataset and they trained their own model and we were super excited and like, you know, celebrated that work.ZoomVibhu: different. Zoom is, zoom is CGI, I think, uh, you know, also just to add like a lot of models don't put out based models and if there's that, why is fine tuning not taken off?You know, you can do your own training. Yeah,Kyle: sure.Vibhu: You guys put out based model, I think you put out everything.Nader: I believe I know [00:38:00]swyx: about base. BasicallyVibhu: without baseswyx: basic can be cancelable.Vibhu: Yeah. Base can be cancelable.swyx: Yeah.Vibhu: Safety training.swyx: Did we get a full picture of dymo? I, I don't know if we, what,Nader: what I'd love is you, you mentioned the three axes like break it down of like, you know, what's prefilled decode and like what are the optimizations that we can get with Dynamo?Kyle: Yeah. That, that's, that's, that's a great point. So to summarize on that three axis problem, right, there are three things that determine whether or not something can be done with inference, cost, quality, latency, right? Dynamo is supposed to be there to provide you like the runtime that allows you to pull levers to, you know, mix it up and move around the parade of frontier or the preto surface that determines is this actually possible with inference And AI todayNader: gives you the knobs.Kyle: Yeah, exactly. It gives you the knobs.Disaggregation Prefill vs DecodeKyle: Uh, and one thing that like we, we use a lot in contemporary inference and is, you know, starting to like pick up from, you know, in, in general knowledge is this co concept of disaggregation. So historically. Models would be hosted with a single inference engine. And that inference engine [00:39:00] would ping pong between two phases.There's prefill where you're reading the sequence generating KV cache, which is basically just a set of vectors that represent the sequence. And then using that KV cache to generate new tokens, which is called Decode. And some brilliant researchers across multiple different papers essentially made the realization that if you separate these two phases, you actually gain some benefits.Those benefits are basically a you don't have to worry about step synchronous scheduling. So the way that an inference engine works is you do one step and then you finish it, and then you schedule, you start scheduling the next step there. It's not like fully asynchronous. And the problem with that is you would have, uh, essentially pre-fill and decode are, are actually very different in terms of both their resource requirements and their sometimes their runtime.So you would have like prefill that would like block decode steps because you, you'd still be pre-filing and you couldn't schedule because you know the step has to end. So you remove that scheduling issue and then you also allow you, or you yourself, to like [00:40:00] split the work into two different ki types of pools.So pre-fill typically, and, and this changes as, as model architecture changes. Pre-fill is, right now, compute bound most of the time with the sequence is sufficiently long. It's compute bound. On the decode side because you're doing a full Passover, all the weights and the entire sequence, every time you do a decode step and you're, you don't have the quadratic computation of KV cache, it's usually memory bound because you're retrieving a linear amount of memory and you're doing a linear amount of compute as opposed to prefill where you retrieve a linear amount of memory and then use a quadratic.You know,Nader: it's funny, someone exo Labs did a really cool demo where for the DGX Spark, which has a lot more compute, you can do the pre the compute hungry prefill on a DG X spark and then do the decode on a, on a Mac. Yeah. And soVibhu: that's faster.Nader: Yeah. Yeah.Kyle: So you could, you can do that. You can do machine strat stratification.Nader: Yeah.Kyle: And like with our future generation generations of hardware, we actually announced, like with Reuben, this [00:41:00] new accelerator that is prefilled specific. It's called Reuben, CPX. SoKubernetes Scaling with GroveNader: I have a question when you do the scale out. Yeah. Is scaling out easier with Dynamo? Because when you need a new node, you can dedicate it to either the Prefill or, uh, decode.Kyle: Yeah. So Dynamo actually has like a, a Kubernetes component in it called Grove that allows you to, to do this like crazy scaling specialization. It has like this hot, it's a representation that, I don't wanna go too deep into Kubernetes here, but there was a previous way that you would like launch multi-node work.Uh, it's called Leader Worker Set. It's in the Kubernetes standard, and Leader worker set is great. It served a lot of people super well for a long period of time. But one of the things that it's struggles with is representing a set of cases where you have a multi-node replica that has a pair, right?You know, prefill and decode, or it's not paired, but it has like a second stage that has a ratio that changes over time. And prefill and decode are like two different things as your workload changes, right? The amount of prefill you'll need to do may change. [00:42:00] The amount of decode that you, you'll need to do might change, right?Like, let's say you start getting like insanely long queries, right? That probably means that your prefill scales like harder because you're hitting these, this quadratic scaling growth.swyx: Yeah.And then for listeners, like prefill will be long input. Decode would be long output, for example, right?Kyle: Yeah. So like decode, decode scale. I mean, decode is funny because the amount of tokens that you produce scales with the output length, but the amount of work that you do per step scales with the amount of tokens in the context.swyx: Yes.Kyle: So both scales with the input and the output.swyx: That's true.Kyle: But on the pre-fold view code side, like if.Suddenly, like the amount of work you're doing on the decode side stays about the same or like scales a little bit, and then the prefilled side like jumps up a lot. You actually don't want that ratio to be the same. You want it to change over time. So Dynamo has a set of components that A, tell you how to scale.It tells you how many prefilled workers and decoded workers you, it thinks you should have, and also provides a scheduling API for Kubernetes that allows you to actually represent and affect this scheduling on, on, on your actual [00:43:00] hardware, on your compute infrastructure.Nader: Not gonna lie. I feel a little embarrassed for being proud of my SVG function earlier.swyx: No, itNader: wasreallyKyle: cute. I, Iswyx: likeNader: it's all,swyx: it's all engineering. It's all engineering. Um, that's where I'mKyle: technical.swyx: One thing I'm, I'm kind of just curious about with all with you see at a systems level, everything going on here. Mm-hmm. And we, you know, we're scaling it up in, in multi, in distributed systems.Context Length and Co Designswyx: Um, I think one thing that's like kind of, of the moment right now is people are asking, is there any SOL sort of upper bounds. In terms of like, let's call, just call it context length for one for of a better word, but you can break it down however you like.Nader: Yeah.swyx: I just think like, well, yeah, I mean, like clearly you can engage in hybrid architectures and throw in some state space models in there.All, all you want, but it looks, still looks very attention heavy.Kyle: Yes. Uh, yeah. Long context is attention heavy. I mean, we have these hybrid models, um,swyx: to take and most, most models like cap out at a million contexts and that's it. Yeah. Like for the last two years has been it.Kyle: Yeah. The model hardware context co-design thing that we're seeing these days is actually super [00:44:00] interesting.It's like my, my passion, like my secret side passion. We see models like Kimmy or G-P-T-O-S-S. I'm use these because I, I know specific things about these models. So Kimmy two comes out, right? And it's an interesting model. It's like, like a deep seek style architecture is MLA. It's basically deep seek, scaled like a little bit differently, um, and obviously trained differently as well.But they, they talked about, why they made the design choices for context. Kimmy has more experts, but fewer attention heads, and I believe a slightly smaller attention, uh, like dimension. But I need to remember, I need to check that. Uh, it doesn't matter. But they discussed this actually at length in a blog post on ji, which is like our pu which is like credit puswyx: Yeah.Kyle: Um, in, in China. Chinese red.swyx: Yeah.Kyle: It's, yeah. So it, it's, it's actually an incredible blog post. Uh, like all the mls people in, in, in that, I've seen that on GPU are like very brilliant, but they, they talk about like the creators of Kimi K two [00:45:00] actually like, talked about it on, on, on there in the blog post.And they say, we, we actually did an experiment, right? Attention scales with the number of heads, obviously. Like if you have 64 heads versus 32 heads, you do half the work of attention. You still scale quadratic, but you do half the work. And they made a, a very specific like. Sort of barter in their system, in their architecture, they basically said, Hey, what if we gave it more experts, so we're gonna use more memory capacity.But we keep the amount of activated experts the same. We increase the expert sparsity, so we have fewer experts act. The ratio to of experts activated to number of experts is smaller, and we decrease the number of attention heads.Vibhu: And kind of for context, what the, what we had been seeing was you make models sparser instead.So no one was really touching heads. You're just having, uh,Kyle: well, they, they did, they implicitly made it sparser.Vibhu: Yeah, yeah. For, for Kimmy. They did,Kyle: yes.Vibhu: They also made it sparser. But basically what we were seeing was people were at the level of, okay, there's a sparsity ratio. You want more total parameters, less active, and that's sparsity.[00:46:00]But what you see from papers, like, the labs like moonshot deep seek, they go to the level of, okay, outside of just number of experts, you can also change how many attention heads and less attention layers. More attention. Layers. Layers, yeah. Yes, yes. So, and that's all basically coming back to, just tied together is like hardware model, co-design, which isKyle: hardware model, co model, context, co-design.Vibhu: Yeah.Kyle: Right. Like if you were training a, a model that was like. Really, really short context, uh, or like really is good at super short context tasks. You may like design it in a way such that like you don't care about attention scaling because it hasn't hit that, like the turning point where like the quadratic curve takes over.Nader: How do you consider attention or context as a separate part of the co-design? Like I would imagine hardware or just how I would've thought of it is like hardware model. Co-design would be hardware model context co-designKyle: because the harness and the context that is produced by the harness is a part of the model.Once it's trained in,Vibhu: like even though towards the end you'll do long context, you're not changing architecture through I see. Training. Yeah.Kyle: I mean you can try.swyx: You're saying [00:47:00] everyone's training the harness into the model.Kyle: I would say to some degree, orswyx: there's co-design for harness. I know there's a small amount, but I feel like not everyone has like gone full send on this.Kyle: I think, I think I think it's important to internalize the harness that you think the model will be running. Running into the model.swyx: Yeah. Interesting. Okay. Bash is like the universal harness,Kyle: right? Like I'll, I'll give. An example here, right? I mean, or just like a, like a, it's easy proof, right? If you can train against a harness and you're using that harness for everything, wouldn't you just train with the harness to ensure that you get the best possible quality out of,swyx: Well, the, uh, I, I can provide a counter argument.Yeah, sure. Which is what you wanna provide a generally useful model for other people to plug into their harnesses, right? So if youKyle: Yeah. Harnesses can be open, open source, right?swyx: Yeah. So I mean, that's, that's effectively what's happening with Codex.Kyle: Yeah.swyx: And, but like you may want like a different search tool and then you may have to name it differently or,Nader: I don't know how much people have pushed on this, but can you.Train a model, would it be, have you have people compared training a model for the for the harness versus [00:48:00] like post training forswyx: I think it's the same thing. It's the same thing. It's okay. Just extra post training. INader: see.swyx: And so, I mean, cognition does this course, it does this where you, you just have to like, if your tool is slightly different, um, either force your tool to be like the tool that they train for.Hmm. Or undo their training for their tool and then Oh, that's re retrain. Yeah. It's, it's really annoying and like,Kyle: I would hope that eventually we hit like a certain level of generality with respect to training newswyx: tools. This is not a GI like, it's, this is a really stupid like. Learn my tool b***h.Like, I don't know if, I don't know if I can say that, but like, you know, um, I think what my point kind of is, is that there's, like, I look at slopes of the scaling laws and like, this slope is not working, man. We, we are at a million token con

Startup Project
Inside the Battle for AI Cloud Dominance — Why Cloud Builders like TensorWave are Rethinking NVIDIA's Monopoly | Jeff Tatarchuk, Co-Founder of TensorWave

Startup Project

Play Episode Listen Later Mar 8, 2026 42:18


Rethinking AI Compute Infrastructure: The TensorWave ApproachIn this episode, Jeff Tatarchuk, co-founder of TensorWave, shares how his deep industry experience and innovative mindset are transforming AI compute infrastructure. We explore how building specialized data centers, focusing on AMD GPUs, and creating flexible ecosystems are shaping the future of scalable AI.In this episode:The evolution of cloud companies and the rise of Neo clouds focused on AI computeTensorWave's unique strategy of deploying AMD GPUs in custom data centersLessons learned from FPGA cloud business and transitioning into GPU infrastructureThe technical challenges and solutions in scaling data centers quickly amidst power and supply chain constraintsThe importance of software ecosystems, interoperability, and supporting AMD's software stackHow TensorWave differentiates itself from purely financial arbitrage models and pure Nvidia-centric cloudsAMD's advantages in memory capacity, chiplet architecture, and software supportThe technical intricacies of CUDA versus ROCm, and efforts to build an open ecosystemFuture vision: democratized, reliable, and flexible AI compute options for enterprise and labsTimestamps:00:00 – Introduction to TensorWave and the AI compute landscape02:30 – The rise of Neo clouds and innovation waves in cloud infrastructure06:00 – How TensorWave's FPGA cloud background shaped its GPU strategy10:00 – Challenges in deploying large data centers: power, supply chain, and permitting14:00 – Building and scaling AMD GPU data centers quickly and efficiently19:00 – Software ecosystems: the CUDA moat and TensorWave's ‘Beyond CUDA' summit23:00 – Market differentiation: technical and operational challenges in the Neo cloud space27:00 – Supporting enterprise fine tuning and large-scale training demands32:00 – AMD's technical advantages: VRAM, chiplet architecture, and software support36:00 – Building an open, heterogeneous AI ecosystem beyond CUDA40:00 – What success looks like: a resilient, accessible AI compute futureResources & Links:⁠TensorWave⁠⁠Beyond CUDA Summit⁠⁠Scalar LM by Greg De Almos⁠⁠AMD MI300X Data Center Chip⁠⁠Nvidia H100⁠⁠RoCM Software Stack⁠⁠LinkedIn⁠⁠Twitter⁠This conversation offers a strategic look at how focused infrastructure development, software ecosystem support, and hardware differentiation are critical in shaping the future of accessible, scalable AI compute. Whether you're building data centers, developing AI hardware, or just interested in industry shifts, this episode provides valuable insights into how companies like TensorWave are reshaping the landscape.

Technovation with Peter High (CIO, CTO, CDO, CXO Interviews)
The Thinking Machine: How Jensen Huang Won the GPU War for NVIDIA

Technovation with Peter High (CIO, CTO, CDO, CXO Interviews)

Play Episode Listen Later Mar 5, 2026 55:24


In this episode of Technovation, Peter High speaks with Stephen Witt, award-winning journalist and author of The Thinking Machine, which has been named Business Book of the Year by Financial Times. Witt writes about Jensen Huang's improbable journey from near-bankruptcy in the 1990s GPU wars to leading NVIDIA at the center of the AI revolution. Witt unpacks how NVIDIA defeated nearly 70 competitors, why Huang began targeting “zero-billion-dollar markets,” and how CUDA became the backbone of modern AI. Key highlights from the episode: How investing in zero-billion-dollar markets created durable platform advantage The emerging bull and bear cases for NVIDIA in robotics, edge computing, and global competition The strategic lessons NVIDIA extracted from surviving a 70-competitor GPU market Why operating with a constant “near-death” mindset shaped long-term execution discipline

Apple Coding Daily
Revolución con los M5 Pro y M5 Max, Apple reinventa su arquitectura de chips

Apple Coding Daily

Play Episode Listen Later Mar 4, 2026 33:00


Apple lo ha vuelto a hacer. Pero esta vez no ha sido un "más de lo mismo con mejor nota". El 3 de marzo de 2026 presentó los chips M5 Pro y M5 Max integrados en los nuevos MacBook Pro, y lo que hay dentro es el cambio de arquitectura más importante desde que llegó el M1. No hablamos de más núcleos ni de un proceso de fabricación más fino. Hablamos de repensar desde cero cómo se construye un chip. En este episodio desmontamos la Fusion Architecture pieza a pieza: qué es un die, por qué dividirlo en dos cambia las reglas del juego, qué implica para la disipación térmica, para la fabricación y para el futuro de Apple Silicon. Hablamos de los Neural Accelerators integrados en cada núcleo GPU, del aumento del ancho de banda del Neural Engine, de los 614 GB/s de memoria del M5 Max y de por qué eso importa más que los GHz cuando hablamos de inteligencia artificial en local. Y hacemos la comparativa con NVIDIA que todo el mundo hace pero casi nadie hace bien: CUDA vs MLX, H100 vs M5 Max, datacenter vs mochila. Sin banderas. Con números reales.

MLOps.community
Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

MLOps.community

Play Episode Listen Later Feb 24, 2026 85:49


March 3rd, Computer History Museum CODING AGENTS CONFERENCE, come join us while there are still tickets left.https://luma.com/codingagentsChris Fregly is currently focused on building and scaling high-performance AI systems, writing and teaching about AI infrastructure, helping organizations adopt generative AI and performance engineering principles on AWS, and fostering large developer communities around these topics.Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs // MLOps Podcast #363 with Chris Fregly, Founder, AI Performance Engineer, and InvestorJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide// AbstractIn today's era of massive generative models, it's important to understand the full scope of AI systems' performance engineering. This talk discusses the new O'Reilly book, AI Systems Performance Engineering, and the accompanying GitHub repo (https://github.com/cfregly/ai-performance-engineering). This talk provides engineers, researchers, and developers with a set of actionable optimization strategies. You'll learn techniques to co-design and co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems for both training and inference. // BioChris Fregly is an AI performance engineer and startup founder with experience at AWS, Databricks, and Netflix. He's the author of three (3) O'Reilly books, including Data Science on AWS (2021), Generative AI on AWS (2023), and AI Systems Performance Engineering (2025). He also runs the global AI Performance Engineering meetup and speaks at many AI-related conferences, including Nvidia GTC, ODSC, Big Data London, and more.// Related LinksAI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch 1st Edition by Chris Fregly: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/Coding Agents Conference: https://luma.com/codingagents~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Chris on LinkedIn: /cfreglyTimestamps:[00:00] SageMaker HyperPod Resilience[00:27] Book Creation and Software Engineering[04:57] Software Engineers and Maintenance[11:49] AI Systems Performance Engineering[22:03] Cognitive Biases and Optimization / "Mechanical Sympathy"[29:36] GPU Rack-Scale Architecture[33:58] Data Center Reliability Issues[43:52] AI Compute Platforms[49:05] Hardware vs Ecosystem Choice[1:00:05] Claude vs Codex vs Gemini[1:14:53] Kernel Budget Allocation[1:18:49] Steerable Reasoning Challenges[1:24:18] Data Chain Value Awareness

Sharks Hockey Digest
Brodie Brazil's Teal Talk - John McCarthy

Sharks Hockey Digest

Play Episode Listen Later Feb 20, 2026 8:00


Barracuda Head Coach John McCarthy talks with Brodie Brazil about the Cuda season, impactful players, and looking ahead to the rest of the AHL season.

Sharks Hockey Digest
Cuda Confidential Alumni Check Up: Alex True

Sharks Hockey Digest

Play Episode Listen Later Feb 18, 2026 30:00


In this special alumni edition of Cuda Confidential, Barracuda voice catches up with Olympian and former Barracuda and Sharks' forward Alex True.

Cuda Confidential
Cuda Confidential Alumni Check Up: Alex True

Cuda Confidential

Play Episode Listen Later Feb 18, 2026 30:00


In this special alumni edition of Cuda Confidential, Barracuda voice catches up with Olympian and former Barracuda and Sharks' forward Alex True.

Common Denominator
What Made NVIDIA a $4.5T Company? Jensen Huang's Leadership | NVIDIA, AI & Long-Term Thinking

Common Denominator

Play Episode Listen Later Feb 16, 2026 5:06


In this episode of Common Denominator, I break down one of the most extraordinary leadership stories of our time: Jensen Huang and NVIDIA.Over the last 36 months, NVIDIA has added roughly $100 billion in market cap per month, growing from a $300 billion company to nearly $4.5 trillion. But numbers like that don't happen by accident. They're the result of leadership.In this episode, I explore what kind of leadership it actually takes to build a company like NVIDIA — and what we can all learn from Jensen Huang's 32-year tenure as CEO.Here's what I dive into:- Why leadership compounds over time- The power of thinking in decades, not quarters- Why betting early on AI, GPUs, and CUDA looked irrational — but wasn't- How staying technically fluent at scale protects standards and speed- Why calm is one of the most underrated leadership traits- The difference between managing outcomes and managing direction- How great companies become infrastructure the world can't function withoutOn Common Denominator, I always ask: what's the real force behind extraordinary outcomes? More often than not, it's leadership. Not the title — the substance.Whether you're building a startup, leading a team, investing, or simply trying to lead yourself better, the lessons are the same:Think longer.Stay close to the work.Build for where the world is going.Don't let success dilute conviction.Jensen Huang didn't just build NVIDIA. He demonstrated what leadership looks like in an era of exponential change.And to me, that's the real common denominator.Like this episode? Leave a review here:https://ratethispodcast.com/commondenominator

The MAD Podcast with Matt Turck
Dylan Patel: NVIDIA's New Moat & Why China is "Semiconductor Pilled”

The MAD Podcast with Matt Turck

Play Episode Listen Later Feb 5, 2026 76:44


Dylan Patel (SemiAnalysis) joins Matt Turck for a deep dive into the AI chip wars — why NVIDIA is shifting from a “one chip can do it all” worldview to a portfolio strategy, how inference is getting specialized, and what that means for CUDA, AMD, and the next wave of specialized silicon startups.Then we take the fun tangents: why China is effectively “semiconductor pilled,” how provinces push domestic chips, what Huawei means as a long-term threat vector, and why so much “AI is killing the grid / AI is drinking all the water” discourse misses the point.We also tackle the big macro question: capex bubble or inevitable buildout? Dylan's view is that the entire answer hinges on one variable—continued model progress—and we unpack the second-order effects across data centers, power, and the circular-looking financings (CoreWeave/Oracle/backstops).Dylan PatelLinkedIn - https://www.linkedin.com/in/dylanpatelsa/X/Twitter - https://x.com/dylan522pSemiAnalysisWebsite - https://semianalysis.comX/Twitter - https://x.com/SemiAnalysis_Matt Turck (Managing Director)Blog - https://mattturck.comLinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturckFirstMarkWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCap(00:00) - Intro(01:16) - Nvidia acquires Groq: A pivot to specialization(07:09) - Why AI models might need "wide" compute, not just fast(10:06) - Is the CUDA moat dead? (Open source vs. Nvidia)(17:49) - The startup landscape: Etched, Cerebras, and 1% odds(22:51) - Geopolitics: China's "semiconductor-pilled" culture(35:46) - Huawei's vertical integration is terrifying(39:28) - The $100B AI revenue reality check(41:12) - US Onshoring: Why total self-sufficiency is a fantasy(44:55) - Can the US actually build fabs? (The delay problem)(48:33) - The CapEx Bubble: Is $500B spending irrational?(54:53) - Energy Crisis: Why gas turbines will power AI, not nuclear(57:06) - The "AI uses all the water" myth (Hamburger comparison)(1:03:40) - Circular Debt? Debunking the Nvidia-CoreWeave risk(1:07:24) - Claude Code & the software singularity(1:10:23) - The death of the Junior Analyst role(1:11:14) - Model predictions: Opus 4.5 and the RL gap(1:14:37) - San Francisco Lore: Roommates (Dwarkesh Patel & Sholto Douglas)

Sharks Hockey Digest
Cuda Confidential: Fil the Thrill

Sharks Hockey Digest

Play Episode Listen Later Feb 4, 2026 25:00


In the latest episode of Cuda Confidential, Barracuda voice Nick Nollenberger catches up with second-year center Filip Bystedt to discuss his AHL All-Star nod, breakout sophomore season, and more.

Get Out N Drive Podcast
Is AI Ruining The Automotive Industry and Buying Cars Online?

Get Out N Drive Podcast

Play Episode Listen Later Feb 3, 2026 26:02


Send us a textIs AI ruining the automotive industry? The guys discuss the current climate of AI and its affects what we see and can trust online.Buy the guys some guzzoline! https://buymeacoffee.com/getoutndriveThe Get Out N Drive Podcast is Fuel By AMD ~ AMD: More Than Metal https://www.autometaldirect.com/Visit the ‪AMD‬​ Garage ~ Your one stop source for high quality body panels for your restorationhttps://www.autometaldirect.com/amdgarageFor all things Get Out N Drive, cruise on over to the Get Out N Drive website. https://getoutndrive.com/Be sure to follow GOND on social media!GOND Website: https://getoutndrive.com/IG: https://www.instagram.com/getoutndrivepodcast/X: https://x.com/getoutndrivepodFB: https://www.facebook.com/Get.Out.N.Drive.podcastYouTube: https://www.youtube.com/@getoutndriveRecording Engineer: Paul MeyerSubscribe to the ‪Str8sixfan‬​ YouTube Channel:  @Str8sixfan  #classiccars​ #automotive​ #amd #autometaldirect #c10 #restoration #autorestoration #autoparts #restorationparts #truckrestoration #Jasonchandler #podcast #sheetmetal #mecum #bobbyadams #mecumscandal #carauction #classiccarauction #usedcar #buyaclassiccar #sellaclassiccar#tradeschool​#carengines​#WhatDrivesYOUth​#GetOutNDriveFAST​Join our fb group to share pics of how you Get Out N Drive: https://www.facebook.com/groups/getoutndrivepodcast/Follow Jason on IG: https://www.instagram.com/oldecarrguy/Follow Jason on fb: https://www.facebook.com/oldecarrguySubscribe To the OldeCarrGuy YouTube Channel:  @OldeCarrGuy  Follow John on IG: https://www.instagram.com/customcarnerd/Recording Engineer, Paul MeyerSign Up and Learn more about National Get Out N Drive Day: https://nationalgetoutndriveday.com/Music Credit:Licensor's Author Username:LoopsLabLicensee:Get Out N Drive PodcastItem Title:The RockabillyItem URL:https://audiojungle.netItem ID:25802696Purchase Date:2022-09-07 22:37:20​ UTCSupport the show: https://buymeacoffee.com/getoutndrive#ClassicCarSupport the show

Sharks Hockey Digest
Brodie Brazil's Teal Talk - Cam Lund

Sharks Hockey Digest

Play Episode Listen Later Feb 3, 2026 8:57


Brodie Brazil sits down with San Jose Barracuda forward Cam Lund to talk about his season with the Cuda, his teammates, and more.

brazil lund cuda san jose barracuda teal talk
Dev Interrupted
A constitution for AI, breaking dark flow, and open source as a moat?

Dev Interrupted

Play Episode Listen Later Jan 30, 2026 23:03


In this Friday Deploy, Andrew and Ben dive into the viral Moltbot (now OpenClaw) phenomenon and Steve Yegge's Software Survival 3.0 essay, debating how SaaS companies can build moats in an era of token-constrained engineering. They also explore the concept of "Dark Flow" - a deceptive state where vibe coding feels productive but hides accumulated tech debt - and break down Anthropic's newly released constitution for Claude. Finally, the team discusses a Reddit user's claim to have ported CUDA to AMD in 30 minutes and shares a fascinating breakdown of podcast listening data.LinearB: The AI productivity platform for engineering leadersFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's stories:OpenClawSoftware Survival 3.0Breaking the Spell of Vibe CodingClaude's new constitutionClaude Code Has Managed to Port NVIDIA's CUDA Backend to ROCmMy Top 25 Podcast Episodes & Interviews from 2025 by IPM (Insights Per Minute)OFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.

The Lost Debate
Pretti Killing, ICE Impunity, NVIDIA Dominance

The Lost Debate

Play Episode Listen Later Jan 28, 2026 79:37


Ravi begins by explaining why this conversation matters to him: although he's often skeptical of tech leaders, he sees Nvidia CEO Jensen Huang as a rare and genuinely consequential figure. He briefly reflects on the Pretti shooting in Minnesota and the broader questions it raises about accountability and transparency before turning to the interview. Ravi then speaks with author Stephen Witt about the forces that shaped Jensen's leadership—from a punishing childhood and Nvidia's early brush with failure to the high-risk CUDA bet that helped make modern AI possible. It's an absorbing portrait of leadership, obsession, and the building of one of the most important tech companies of our time. Stephen Witt's Jensen Huang, Nvidia, and the World's Most Coveted Microchip Ravi's Garbage Town ––––– Leave us a voicemail with your thoughts on the show! 201-305-0084⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Follow Ravi at @RaviMGupta Notes from this episode are also available on Substack: https://thelostdebate.substack.com/ Read more from Ravi on Substack: https://realravigupta.substack.com  Follow The Branch at @thebranchmedia Listen to more episodes of Lost Debate on Apple: https://podcasts.apple.com/us/podcast/the-lost-debate/id1591300785 Listen to more episodes of Lost Debate on Spotify: https://open.spotify.com/show/7xR9pch9DrQDiZfGB5oF0F Listen to Where the Schools Went: https://thebranchmedia.org/show/where-the-schools-went/ 

Teal Town USA
Sherwood, Goalie Fight, Strong Push - The Pucknologists 263

Teal Town USA

Play Episode Listen Later Jan 26, 2026 167:01


The Sharks continue to make moves on and off the ice. On the ice, they win two of three games as their push for the playoffs continues, and off the ice, they made a trade to acquire Vancouver Canucks forward Keifer Sherwood. The Barracuda also split two games this week. Other topics include. The roster pinch continues with Chernyshov being sent to the Barracuda. The Sharks have one roster spot as Mukhamadullin and Kurashev near their return. A week away for trade flexibility for Skinner and Klingberg. State of the Sharks recap. Barracuda injuries continue to pile up. More creepy fans. Have your say in the YouTube Superchat on the Sharks, the Cuda and everything hockey! Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!

Teal Town USA
Teal Tinted Rollercoaster - The Pucknologists 262

Teal Town USA

Play Episode Listen Later Jan 19, 2026 85:24


The Sharks' eastern roadswing continues to have its ups and downs. A win in Washington, followed by being handled by the Detroit Red Wings. The Barracuda were on a rollercoaster of their own as they split a weekend set in Tucson. Other topics include. Nick Leddy is finally on waivers. Sharks having an A+ Season Sharks players/prospects are plentiful in Pronman's midseason U23 list. Michael Misa can stay after a minor league trade. And More! Have your say in the YouTube Superchat on the Sharks, the Cuda and everything hockey! Teal Town USA - A San Jose Sharks' post-game podcast, for the fans, by the fans! Subscribe to catch us after every Sharks game and our weekly wrap-up show, The Pucknologists! Check us out on YouTube and remember to Like, Subscribe, and hit that Notification bell to be alerted every time we go live!

Get Out N Drive Podcast
The history and evolution of 70s Boogie Vans

Get Out N Drive Podcast

Play Episode Listen Later Jan 19, 2026 35:03


Send us a textThe Get Out N Drive Podcast Is Fueled By AMDIn this episode the guys talk about the birth, evolution and resurgence of 70s era Boogie Vans.  Buy the guys some gas!The Get Out N Drive Podcast is Fuel By AMD ~ AMD: More Than MetalVisit the ‪AMD‬​ Garage ~ Your one stop source for high quality body panelsFor all things Get Out N Drive, cruise on over to the Get Out N Drive website.Be sure to follow GOND on social media!GOND WebsiteIGXFBYouTubeRecording Engineer, Paul MeyerSubscribe to the ‪Str8sixfan‬​ YouTube Channel#classiccars​ #automotive​ #amd #autometaldirect #c10 #restoration #autorestoration #autoparts #restorationparts #truckrestoration #Jasonchandler #podcast #sheetmetal #carbuild #truckbuild #carproject #2025 #2026 #yearendreview#WhatDrivesYOUth​#GetOutNDriveFAST​Join our fb group to share pics of how you Get Out N DriveFollow Jason on IGIGFollow Jason on fbSubscribe To the OldeCarrGuy YouTube ChannelFollow John on IGRecording Engineer, Paul MeyerSign Up and Learn more about National Get Out N Drive Day.Music Credit:Licensor's Author Username:LoopsLabLicensee:Get Out N Drive PodcastItem Title:The RockabillyItem URL:https://audiojungle.ne...​Item ID:25802696Purchase Date:2022-09-07 22:37:20​ UTCSupport the show