Podcasts about yann lecun

  • 262PODCASTS
  • 491EPISODES
  • 49mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 12, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about yann lecun

Latest podcast episodes about yann lecun

Impact Theory with Tom Bilyeu
SpaceX IPO Day, We Won The Iran War Again, & US Tops Oil Export List | The Tom Bilyeu Show Live

Impact Theory with Tom Bilyeu

Play Episode Listen Later Jun 12, 2026 90:05


What's up, everybody? It's Tom Bilyeu here:If you want my help...STARTING a business: join me here at ZERO TO FOUNDER: https://tombilyeu.com/zero-to-founder?utm_campaign=Podcast%20Offer&utm_source=podca[%E2%80%A6]d%20end%20of%20show&utm_content=podcast%20ad%20end%20of%20showSCALING a business: see if you qualify here.: https://tombilyeu.com/callGet my battle-tested strategies and insights delivered weekly to your inbox: sign up here.:https://tombilyeu.com/**********************************************************************If you're serious about leveling up your life, I urge you to check out my new podcast, Tom Bilyeu's Mindset Playbook —a goldmine of my most impactful episodes on mindset, business, and health. Trust me, your future self will thank you.**********************************************************************FOLLOW TOM:Instagram: https://www.instagram.com/tombilyeu/Tik Tok: https://www.tiktok.com/@tombilyeu?lang=enTwitter: https://twitter.com/tombilyeuYouTube: https://www.youtube.com/@TomBilyeuKetone IQ: Visit https://ketone.com/IMPACT for 30% OFF your subscription orderQuince: Free shipping and 365-day returns at https://quince.com/impactpodPlaud: Get 10% off with code IMPACT at https://plaud.ai/impactWhatnot:Download the Whatnot app today and get free shipping on your first order. AT&T Business: Switch to AT&T Business at business.att.comShopify: Sign up for your one-dollar-per-month trial period at https://shopify.com/impactTruemed: Check your eligibility and start saving at https://truemed.com/impactIncogni: Take your personal data back with Incogni! Use code IMPACT at the link below and get 60% off an annual plan: https://incogni.com/impactPique: 20% off at https://piquelife.com/impactIn this Friday edition of The Tom Bilyeu Show Live, Tom and Drew dig into a packed news day spanning geopolitics, markets, tech, and a long philosophical tangent on immortality. They open on Iran, breaking down the leaked 14-point "deal" circulating via Iranian state media — the $24 billion in frozen assets, the naval blockade, the Strait of Hormuz, and reconstruction demands — and why Tom is deeply skeptical that anything beyond a memorandum of understanding gets signed, plus what a bad deal could cost Trump heading into the midterms. From there, they pivot to a heated exchange over the SpaceX IPO and the Globe and Mail's "how to properly hate Elon Musk" headline, using it as a springboard into the psychology of resentment, the mechanics of transformational-tech bubbles, and a warning to retail investors about becoming "exit liquidity." The conversation moves through California's voting rules, ballot harvesting, and Trump's Save America Act and reconciliation push (with an extended back-and-forth on states' rights, the Constitution, and the Supreme Court), the UK's proposed device-level content-scanning law and the surveillance-state implications, a DOJ child-smuggling indictment tied to border policy, and the Epstein/Zorro Ranch mystery. They close on AI — unpacking Yann LeCun's argument against LLMs and AGI in favor of specialized world models — before spinning off into a wide-ranging debate about whether you'd actually want to live forever, the disposable-male hypothesis, and a contentious Alex Karp clip about GDP and gender.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Tronche de Tech
#73 - Bertrand Charpentier - Faire maigrir les IAs

Tronche de Tech

Play Episode Listen Later Jun 4, 2026 72:22


Ce disciple de Yann LeCun est en passe de faire économiser plusieurs milliards à l'industrie de l'IA. En résolvant un problème qui plombe tous les modèles. Tout a commencé chez Twitter. À l'époque, Bertrand fait de la recherche sur les modèles d'IA du réseau social américain. Tout se passe bien, jusqu'à l'arrivée d'un certain… Elon Musk. Car quand il débarque, Musk regarde ce que vous “produisez”. Il compte vos lignes de codes. (oui oui

Objectif TECH
Le Lab - Sofiane Schaack : Quand un physicien décrypte les modèles d'IA​

Objectif TECH

Play Episode Listen Later Jun 2, 2026 25:08


Qu'est-ce qu'un physicien fait dans un laboratoire d'IA ? Il pose les bonnes questions.C'est le fil conducteur du parcours de Sofiane Schaack, directeur Data & IA chez Capgemini Invent. De la mécanique quantique à la modélisation du vol, jusqu'aux grands modèles de langage, il n'a jamais cessé de chercher à comprendre ce qui se cache derrière les systèmes.Dans cet épisode du Lab, il raconte ce chemin peu commun : ses débuts comme codeur autodidacte, son passage par la recherche puis le conseil, et la manière dont il a construit une activité d'IA à la croisée de la physique et des données.Mais surtout, il partage les questions qui l'animent aujourd'hui : que se passe-t-il réellement à l'intérieur d'un réseau de neurones ? Que représentent ces modèles… et que comprennent-ils vraiment ?Un épisode qui invite à prendre du recul. Sur l'IA, et sur notre propre façon de comprendre le monde.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,

Génération Do It Yourself
[EXTRAIT] Yann Le Cun - "Toute la Silicon Valley se trompe sur l'IA”

Génération Do It Yourself

Play Episode Listen Later May 24, 2026 10:34


Pour écouter l'épisode en entier, tapez "#543 - Yann Le Cun - AMI Labs - Rendre l'IA plus humaine" sur votre plateforme d'écoute.Hébergé par Audiomeans. Visitez audiomeans.fr/politique-de-confidentialite pour plus d'informations.

Génération Do It Yourself
#543 - Yann Le Cun - AMI Labs - Rendre l'IA plus humaine

Génération Do It Yourself

Play Episode Listen Later May 24, 2026 101:10


Il a construit le laboratoire IA le plus influent du monde avant de tout quitter pour recommencer de zéro.Toute l'industrie de l'IA mise sur la même chose mais Yann Le Cun pense qu'ils font fausse route.Professeur à la New York University depuis 23 ans, Yann rejoint Facebook en 2013 et fonde FAIR, le laboratoire de recherche en intelligence artificielle de Meta, qu'il dirige pendant quatre ans et demi. Il devient ensuite Chief AI Scientist pour reprendre ses travaux de recherche.Pendant 15 ans, il travaille en parallèle sur ce qu'il appelle l'IA pour le monde réel.Pas des systèmes qui prédisent le mot suivant dans une phrase, mais des systèmes capables de comprendre ce qui va se passer dans une vidéo, d'anticiper les conséquences de leurs actions et d'apprendre une nouvelle tâche la première fois qu'ils y sont confrontés.Comme un humain ou un animal.Le 31 décembre 2025, il quitte officiellement Meta et cofonde, à 65 ans, AMI Labs avec Alexandre Le Brun, ancien de Facebook et fondateur de Nabla.La levée de fonds dépasse le milliard de dollars et devient le plus grand seed européen de tous les temps.Yann Le Cun explique pourquoi l'IA que tout le monde utilise aujourd'hui n'est pas intelligente.Il revient sur ce qu'est vraiment un LLM, pourquoi augmenter leur taille ne mènera jamais à l'intelligence de niveau humain et ce qu'il faut construire à la place.Mais aussi, comment AMI Labs compte développer ses modèles.Un épisode concret pour comprendre l'IA telle qu'elle est, pas telle qu'on la vend avec l'un des rares chercheurs à avoir posé les fondations de l'IA moderne et qui pense déjà à ce qui vient après.Vous pouvez contacter Yann sur Linkedin.Vous souhaitez sponsoriser Génération Do It Yourself ou nous proposer un partenariat ?Contactez mon label Orso Media via ce formulaire.TIMELINE:00:00:00 - Quitter Meta pour construire l'IA d'après00:11:49 - L'IA d'aujourd'hui n'est pas intelligente00:16:49 - « L'intelligence n'est pas une accumulation de connaissances »00:25:26 - Tout le monde se trompe sur les LLM00:33:38 - L'IA surhumaine est inévitable00:43:58 - Aucune entreprise de robots ne sait comment les rendre utiles00:55:38 - L'IA excelle où l'humain est remplaçable, avis01:02:36 - Le world model : ce qui manque à l'IA01:14:58 - YouTube est le plus grand dataset du monde01:26:15 - Est-ce que l'IA peut prédire les catastrophes avant qu'elles arrivent ?01:32:22 - Tout le monde deviendra le patron d'une équipe d'IALes anciens épisodes de GDIY mentionnés : #534 - Sixte de Vauplane - Animaj - Le studio d'animation qui fait trembler Hollywood#500 - VO - Reid Hoffman - LinkedIn, Paypal - How to master humanity's most powerful invention#500 - VF - Reid Hoffman - LinkedIn, Paypal - Comment dompter l'invention la plus puissante de l'humanité#452 - VO - Reid Hoffman - LinkedIn, Paypal - "We are more Homo technicus than Homo sapiens"#452 - VF - Reid Hoffman - LinkedIn, Paypal - L'humanité 2.0 : Homo technicus plus qu'Homo sapiens#397 - Yann Le Cun - Chief AI Scientist chez Meta - L'Intelligence Artificielle Générale ne viendra pas de Chat GPTNous avons parlé de :Qu'est-ce qu'un grand modèle de langage (LLM) ?« L'explosion de l'intelligence artificielle a été beaucoup plus rapide que le temps universitaire »Intelligence artificielle généraleLes voitures autonomes WaymoNotre documentaire sur la Chine : Comment la Chine est devenue imbattable ?Comment Jean-Louis Constanza voit l'avenir de la robotique sans robotsAI: Connaissez-vous les Joint Embedding Predictive Architectures (JEPA) et les World Models ?Plaud AISystème 1 / Système 2 : Les deux vitesses de la penséeMusk rachète Cursor, attaque OpenAI, et Tim is Cooked !Les recommandations de lecture :Are We Smart Enough to Know How Smart Animals Are?, by Frans de Waal

Unsupervised Learning
Ep 86: Yann LeCun on Leaving Meta, Breaking The LLM Paradigm, & Why Hinton is Wrong

Unsupervised Learning

Play Episode Listen Later May 15, 2026 81:56


Yann LeCun, Turing Award winner and former Chief AI Scientist at Meta, joins Jacob Effron. The conversation centers on Yann's contrarian thesis that LLMs are a dead-end on the path to human-level intelligence, despite being useful products — because they can't predict the consequences of their actions, can't plan, and fundamentally can't model the messy, high-dimensional real world. He unpacks his alternative architecture, JEPA (Joint Embedding Predictive Architecture), which learns abstract representations rather than generating pixel-level predictions, and explains why this approach is essential for robotics, industrial applications, and any system that needs to operate beyond the substrate of language. Yann also reveals the real story behind his departure from Meta (he had zero technical influence on Llama, contrary to public narrative), the genesis of his Tapestry project for sovereign open-source AI, why he believes LLMs are intrinsically unsafe, where he diverges from his fellow Turing laureates Hinton and Bengio, and why he predicts the industry will recognize the paradigm shift by early 2027. Throughout, he offers candid reflections on the tension between research and product at major labs, and why he intentionally headquartered AMI Labs in Paris with zero Silicon Valley VC money.   (0:00) Introduction  (01:45) Why LLMs Aren't the Path to Intelligence  (07:51) AMI and World Models  (12:07) The JEPA Architecture Explained  (15:55) Problems with Robotics Models Today  (20:37) Silicon Valley Herd Behavior  (28:18) Tapestry: Sovereign AI for the Rest of the World  (35:49) OpenAI Is the Next Sun Microsystems  (40:51) Why Yann's Views Diverged from Hinton & Bengio  (44:32) LLMs Are Intrinsically Unsafe  (58:00) Why Yann Left Meta  (1:00:26) Reflections on FAIR  (1:12:11) Advice for PhD Students   LeWorldModel Paper: https://arxiv.org/abs/2603.19312   With your host:  @jacobeffron  - Partner at Redpoint

Les Cast Codeurs Podcast
LCC 340 - Episode on l'voit on l'voit pas

Les Cast Codeurs Podcast

Play Episode Listen Later May 12, 2026 111:31


Java 26 est là, GraalVM cartonne chez Trivago (43 à 12 réplicas !), OpenJDK interdit le code généré par LLM, Spring et Quarkus enchaînent les releases. Côté IA : ADK 1.0, A2A, Lyria 3 chante (mal ?), Yann LeCun lance Ami Labs et ses World Models. Mythos d'Anthropic fait trembler la sécu, Claude Code a leaké son source, et les git worktrees envahissent vos terminaux. Bonus : la mort annoncée de l'IDE, vagues de licenciement chez Oracle et Block, et nos voix toutes clonées. Bon week-ends de mai ! Enregistré le 7 mai 2026 Téléchargement de l'épisode LesCastCodeurs-Episode-340.mp3 ou en vidéo sur YouTube. News Langages Retour d'expérience d'une migration vers graalVM chez Trivago https://medium.com/graalvm/inside-trivagos-graalvm-migration-native-image-for-graphql-at-scale-912bca9df841 La passerelle GraphQL de Trivago (point d'entrée de tout le trafic vers 48 microservices) souffrait de pics de timeout au démarrage JVM Résultats spectaculaires après migration vers GraalVM Native Image : réduction des réplicas de 43 à 12, CPU de 15 à 5 cœurs, images Docker plus légères Obstacles techniques : incompatibilité Log4j → migration vers Logback, remplacement de Mockk par Testcontainers, compilation CI/CD très gourmande Netflix DGS et d'autres librairies manquaient de support GraalVM → l'équipe a contribué des correctifs upstream en open source Approche recommandée : commencer par les services les moins complexes, investir massivement dans les tests automatisés À la 14e migration, le processus était si rodé qu'il allait plus vite que la toute première tentative OpenJDK Interim Policy on Generative AI - https://openjdk.org/legal/ai OpenJDK adopte une politique intérimaire interdisant toute contribution incluant du contenu généré par des LLMs, modèles de diffusion ou systèmes deep-learning Le périmètre est large : code source, texte, images dans les dépôts Git, pull requests GitHub, emails, pages wiki et issues JBS Les contributeurs peuvent utiliser les outils d'IA de manière privée pour comprendre, déboguer et relire le code OpenJDK, mais ne peuvent pas contribuer le contenu généré Trois risques justifient cette politique : surcharge des relecteurs face au code plausible mais incorrect, risques de sûreté/sécurité pour une plateforme critique, et risques de propriété intellectuelle (l'OCA exige que les contributeurs possèdent les droits IP de leurs contributions) Même éditer partiellement du code AI-généré ne le rend pas acceptable à la contribution Oracle, sponsor corporatif d'OpenJDK, travaille sur une politique complète à soumettre au Governing Board GraalVM Native Image et la Closed-World Assumption en Java https://pvs-studio.com/en/blog/posts/java/1357/ Un bon article de rappel du contexte de closed world en Java GraalVM Native Image compile les applications Java en exécutables natifs statiques, sans JVM au runtime. La JVM fonctionne en monde ouvert : les classes sont chargées à la demande, les appels sont des références symboliques résolues dynamiquement. Native Image impose la "closed-world assumption" : tous les chemins d'exécution doivent être connus à la compilation. Les fonctionnalités dynamiques Java (réflexion, proxies, chargement de classes) créent des chemins cachés invisibles à l'analyse statique. C'est pourquoi Native Image exige des fichiers de configuration explicites pour la réflexion, les proxies, les ressources et la FFM API. L'article illustre le problème avec la Foreign Function & Memory API pour appeler printf natif : fonctionne sur JVM, échoue en Native Image sans config. Inclure tout le bytecode accessible serait inutilisable : binaire géant, compilation très lente, et la réflexion nécessite des métadonnées précises. La configuration n'est pas un défaut de conception mais une conséquence logique du passage du dynamique au statique. Java 26 : les nouveautés https://foojay.io/today/java-26-whats-new/ Java est le langage de la JVM, publié tous les 6 mois depuis Java 9 ; Java 26 est une version non-LTS avec 10 JEPs. JEP 500 : protection des champs final modifiés par réflexion profonde, avec des avertissements configurables. JEP 504 : suppression définitive de l'API Applet, plus supportée par les navigateurs. JEP 516 : le cache AOT (Project Leyden) fonctionne désormais avec n'importe quel garbage collector. JEP 517 : support HTTP/3 dans le client HTTP, HTTP/2 reste le défaut mais HTTP/3 est accessible à la demande. JEP 522 : amélioration du débit du GC G1 en réduisant la synchronisation entre threads applicatifs et threads GC. Nouveau support des UUIDv7 via UUID.ofEpochMillis(), naturellement triables et adaptés aux identifiants de bases de données. Process devient AutoCloseable, utilisable dans un try-with-resources. Aucune fonctionnalité en preview n'est graduée en standard ; Structured Concurrency en est à sa 6e preview. Librairies Guillaume a créé une petite librairie Java sans dépendance pour extraire le JSON d'une réponse d'un LLM un peu verbeux https://glaforge.dev/posts/2026/03/22/extracting-json-from-llm-chatter-with-jsonspotter/ Les LLM génèrent souvent du JSON, mais il est parfois entouré de bla-bla et/ou contient des erreurs (ex: commentaires, virgules finales) qui bloquent les parseurs JSON standards. Guillaume a créé une petite librairie légère sans dépendance pour localiser et extraire la structure la plus longue ressemblant à du JSON (même malformé) On peut ensuite passé cette chaîne à un parseur "lénient" (plus tolérant) comme Jackson pour ensuite avoir de bons vieux objets Java fortement typés Librairie dispo sur Maven Central ADK Java sort sa version 1.0 (Agent Development Kit par Google) https://developers.googleblog.com/announcing-adk-for-java-100-building-the-future-of-ai-agents-in-java/ ADK est un framework open source de Google pour créer des agents IA, initialement en Python, maintenant multi-langages (Python, Java, Go, Typescript). Nouvelles fonctionnalités majeures : Outils puissants : GoogleMapsTool, UrlContextTool, ContainerCodeExecutor, VertexAiCodeExecutor, abstraction ComputerUseTool. Architecture de plugins centralisée : Nouveau conteneur App pour gérer les Plugins à l'échelle de l'application (ex: LoggingPlugin, GlobalInstructionPlugin). Context engineering amélioré : Compaction d'événements pour gérer la taille des fenêtres de contexte (résumé et rétention). Human-in-the-Loop (HITL) : Supporte les workflows ToolConfirmation pour approbation humaine des actions d'agent. Services de session et de mémoire : Contrats clairs pour la gestion de l'état (InMemory, VertexAI, Firestore) et la mémoire à long terme. Support Agent2Agent (A2A) : Collaboration native entre agents distants de différents frameworks via le protocole A2A. Dans cet autre article, Guillaume partage comment il a développé l'application Comic Trip montrée dans la vidéo YouTube et qui utilise ADK 1.0 https://glaforge.dev/posts/2026/03/30/building-my-comic-trip-agent-with-adk-java-1-0/ Nouvelle version du SDK Java pour Agent2Agent Protocol, avec le support de la version 1.0 de la spécification https://medium.com/google-cloud/a2a-java-sdk-1-0-0-beta1-released-e83c414b34cc Alignement avec la version 1.0 de la spécification Nouveau groupId org.a2aproject.sdk et package org.a2aproject.sdk Protocoles de transport : support complet et équivalent pour JSON-RPC, gRPC et HTTP+JSON/REST. Gestion des erreurs : introduction de codes d'erreur et détails structurés pour une meilleure observabilité. Optimisation HTTP : ajout d'en-têtes de cache pour les métadonnées des agents (Agent Card). Flexibilité du client HTTP : support par défaut du JDK HttpClient, avec option Vert.x pour les environnements Quarkus. Nouvelles fonctionnalités techniques : méthode DataPart.fromJson() pour la création simplifiée d'objets depuis du JSON brut. Prochaines étapes (v1.0.0.GA) : support simultané des versions 1.0.0 et 0.3.0 du protocole pour assurer l'interopérabilité. JPA 4.0 Milestone 2 : nouvelles fonctionnalités pour Jakarta Persistence https://in.relation.to/2026/04/23/JPA-4-M2/ Jakarta Persistence (JPA) est la spécification standard Java pour le mapping objet-relationnel (ORM), implémentée notamment par Hibernate. JPA 4.0 M2 est la deuxième milestone de la prochaine version majeure de la spécification, annoncée par Gavin King. Construction de requêtes Criteria à partir de chaînes JPQL, offrant plus de flexibilité dans la composition dynamique des requêtes. Nouveaux types d'expressions spécialisés (TextExpression, NumericExpression) pour simplifier l'écriture des requêtes Criteria. Nouvelle interface FetchOption pour contrôler explicitement la stratégie de chargement des associations, dont un BatchSize intégré. Nouvelle annotation @EntityListener qui découple les classes entités de leurs listeners, supprimant les dépendances à la compilation. Les listeners peuvent cibler plusieurs types de callbacks et s'appliquer globalement à toute l'unité de persistance. Introduction de FlushModeType.EXPLICIT et QueryFlushMode pour un contrôle plus fin de la synchronisation avec la base de données. La méta-annotation @Discoverable permet de placer des annotations comme @NamedQuery sur n'importe quelle classe ou interface. Améliorations du DDL via @Index amélioré et clarifications de la spécification via la javadoc. Quarkus 3.35 : tree-shaking, PGO et AOT Semeru https://quarkus.io/blog/quarkus-3-35-released/ Quarkus est un framework Java cloud-natif optimisé pour GraalVM et HotSpot, conçu pour les microservices et les environnements conteneurisés. Nouveau JAR tree-shaking expérimental : analyse des dépendances à la compilation pour supprimer les classes inutilisées. Sur le CLI Quarkus, cela supprime plus de 6 000 classes et économise environ 18 Mo (39,5 %). Support du Profile-Guided Optimization (PGO) pour les builds natifs via quarkus.native.pgo.enabled=true. Le PGO est une fonctionnalité Oracle GraalVM, non disponible dans la Community Edition. Support de l'AOT IBM Semeru : le démarrage passe de ~380 ms à ~190 ms dans les premiers tests. Nouvelle extension quarkus-reactive-transactions : support de @Transactional pour les méthodes Hibernate Reactive retournant Uni. Configuration CORS dédiée pour l'interface de management, indépendante de l'interface HTTP principale. Les tests n'utilisent plus les System Properties pour la propagation de configuration, facilitant la parallélisation future. Le serializer jackson sans reflection n'est pas le default du aux retours de cas limites, encore du travail This Week in Spring - 21 avril 2026 https://spring.io/blog/2026/04/21/this-week-in-spring-april-21-2026 Spring Framework 6.2.18 et 7.0.7 corrigent trois failles de sécurité : DoS via fichiers multipart WebFlux, empoisonnement de cache de ressources statiques, et DoS sur Windows. Le support open source de Spring Framework 5.3.x et 6.1.x est terminé, la migration est recommandée. Spring Data 2026.0.0-RC1 introduit l'upsert (MERGE/INSERT ON CONFLICT) dans l'API Template de Spring Data Relational. Spring Data ajoute un RedisMessageSendingTemplate pour la cohérence avec les listeners Redis, et une optimisation de réinitialisation de caches en un seul appel. Spring AI introduit une Session API (série Agentic Patterns, partie 7) : architecture event-sourcée pour la mémoire des agents IA. La Session API supporte la compaction turn-safe, l'isolation de sous-agents en parallèle, et la persistence JDBC (PostgreSQL, MySQL, MariaDB, H2). Elle vise Spring AI 2.1 (novembre 2026) et remplacera à terme l'API ChatMemory. Spring Vault 4.1.0-RC1 et 4.0.2 sont disponibles. Netflix a présenté son usage de Java, Spring Boot et Spring AI dans une vidéo. This Week in Spring - 28 avril 2026 https://spring.io/blog/2026/04/28/this-week-in-spring-april-28-2026 Cette série hebdomadaire de Josh Long compile les nouveautés de l'écosystème Spring : articles, outils, podcasts et annonces de la communauté. Spring Boot 4 introduit un package natif de résilience org.springframework.resilience avec une nouvelle API de retry qui remplace les approches fragiles via Spring Retry ou Resilience4j. L'API retry native de Spring Boot 4 a des noms d'attributs et sémantiques différents des anciennes bibliothèques, rendant les tutoriels pré-2025 obsolètes et sources de bugs silencieux. Le SDK Spring AI pour Amazon Bedrock AgentCore est disponible en GA : il intègre les capacités AgentCore dans Spring AI via annotations et auto-configuration. Le SDK AgentCore gère automatiquement le contrat runtime AgentCore : endpoint /invocations, health check /ping, SSE avec backpressure. Il offre mémoire court terme (sliding window) et long terme (sémantique, préférences, résumé, épisodique), ainsi que des outils pour navigateur et exécution de code en sandbox. Un plugin Maven (Nullability Maven Plugin) simplifie l'intégration de JSpecify et NullAway pour enforcer la null-safety à la compilation dans les projets Java. Le plugin génère automatiquement les fichiers package-info.java par package et configure le compilateur pour traiter les violations de nullabilité comme des erreurs. Josh Long et Dr. Venkat Subramaniam ont co-présenté à Voxxed Days Amsterdam sur "Intelligent Kotlin", avec un épisode de podcast associé. Cloud Amazon S3 Files https://aws.amazon.com/about-aws/whats-new/2026/04/amazon-s3-files/ Amazon S3 Files est un nouveau service donnant un accès système de fichiers direct aux données stockées dans les buckets S3 Basé sur la technologie Amazon EFS, il supprime la barrière entre stockage objet et interface système de fichiers sans dupliquer les données Débit en lecture pouvant atteindre plusieurs téraoctets par seconde ; des milliers de ressources de calcul peuvent y accéder simultanément Les données restent accessibles via les deux interfaces : S3 API classique et système de fichiers standard, sans migration nécessaire Cas d'usage : agents IA pour la persistance de mémoire entre pipelines, équipes ML sans staging, simplification des data lakes Disponible dans 34 régions AWS Data et Intelligence Artificielle Comment générer de la musique et des clips audio en Java avec le modèle Lyria 3 https://glaforge.dev/posts/2026/03/25/generating-music-with-lyria-3-and-the-gemini-interactions-java-sdk/ Génération musicale avec Lyria 3 (DeepMind) et le SDK Java Gemini Interactions. Lyria 3 : modèle d'IA générative pour créer musique avec paroles ou pistes instrumentales. Utilisation via le SDK Java de l'API Gemini, nécessite une clé API Gemini. Deux versions de modèle Lyria 3 : lyria-3-clip-preview : Clips courts (30s), extraits. lyria-3-pro-preview : Chansons complètes (jusqu'à 3 min), structurées. Personnalisation via les prompts : Fournir ses propres paroles ou les faire générer. Contrôler la structure de la chanson ([Intro], [Verse], [Chorus], [Outro]). Générer des morceaux instrumentaux uniquement. Utiliser des images comme source d'inspiration (modèle multimodal). Sortie : Audio (MP3) et texte (paroles/structure) directement, sans décodage complexe. Facilite l'intégration de la génération musicale dans les applications Java. Les world model, la prochaine étape pour les IA https://www.lepoint.fr/sciences-nature/comment-le-commando-de-yann-le-cun-se-prepare-a-ringardiser-les-geants-mondiaux-de-lia-depuis-paris-OZVUWTDYBNE25C6WF44265ZQKE/ Yann LeCun a quitté Meta FAIR pour créer AMI Labs (Advanced Machine Intelligence) basée à Paris Sa thèse : les LLMs ne mèneront pas à l'intelligence générale, la vraie IA doit partir de la compréhension du monde physique AMI Labs a levé 1,03 milliard de dollars en seed (le plus grand seed round de l'histoire européenne) à 3,5 milliards de valorisation Les world models apprennent à prédire et comprendre la réalité physique plutôt qu'à prédire le prochain token d'une séquence Slogan d'AMI : "Real intelligence does not start in language. It starts in the world." Paris comme base stratégique pour challenger la Silicon Valley dans la prochaine rupture de l'IA Debezium 2026 : résultats du sondage communautaire https://debezium.io/blog/2026/04/27/debezium-2026-survey-results/ Debezium est un outil de Change Data Capture (CDC) open source qui capture les modifications de bases de données en temps réel pour les diffuser vers des systèmes comme Kafka. 98,6% des répondants utilisent Debezium activement ou prévoient de le faire dans l'année, avec 91,3% déjà en production. 63,8% des déploiements tournent sur Kubernetes, 60,9% utilisent Kafka Connect auto-géré, et 17,4% restent sur des VMs ou bare metal. Helm charts est l'approche dominante pour la gestion de configuration, souvent combiné avec GitOps, CI/CD, Ansible ou Terraform. PostgreSQL domine les connecteurs utilisés à 69,6%, suivi de MySQL (33,3%), SQL Server (29%) et Oracle (27,5%). Les volumes de changements capturés vont de 1-25 modifications par minute jusqu'à 1-2 millions par minute selon les environnements. Infinispan rejoint l'écosystème OGX comme fournisseur de stockage vectoriel https://infinispan.org/blog/2026/04/17/infinispan-joins-ogx-ecosystem OGX (anciennement Llama Stack) est un serveur API agentique open source pour construire des applications d'IA complètes. OGX compose des fournisseurs d'inférence, des stores vectoriels, des backends de sécurité, des runtimes d'outils et du stockage de fichiers en un seul serveur déployable. OGX se positionne comme une alternative à l'API OpenAI, déployable sur diverses infrastructures et modèles. OGX cible les workflows RAG (Retrieval-Augmented Generation) et les applications agentiques. Infinispan s'y intègre comme fournisseur de vector IO, apportant recherche vectorielle, par mots-clés et hybride. Je n'ai pas entendu parlé de ce renommage, vous le voyez dans vos deploiements ? Outillage cmux un nouveau terminal basé sur Ghostty spécialisé pour les coding agents https://cmux.com/ Application macOS native construite sur le moteur de rendu Ghostty (libghostty), offrant une accélération GPU pour une fluidité maximale Conçu spécifiquement pour le multitâche et les workflows assistés par IA, avec des onglets verticaux affichant la branche Git, le répertoire et les ports actifs Intègre des notifications qui illuminent les panneaux lorsqu'un agent IA (Claude Code, Codex, etc.) nécessite l'attention de l'utilisateur Propose un navigateur web intégré et scriptable qui peut être affiché en écran scindé à côté du terminal via une API Alternative moderne à tmux, ne nécessitant pas de fichiers de configuration complexes ou de préfixes de touches pour la gestion des vitres et des sessions Supporte nativement tous les agents de codage en ligne de commande et permet l'automatisation via une API socket et une interface CLI dédiée Git Worktree comme un chef https://www.metal3d.org/blog/2026/git-worktree-comme-un-chef/ Article par Patrice Ferlet Git Worktree: Travailler sur plusieurs branches simultanément via des répertoires distincts. Évite git stash ou clones multiples pour le changement de contexte rapide. Méthode "bare" (recommandée): Cloner le dépôt en mode bare (ex: .bare). Lier le dossier racine au dépôt bare via un fichier .git. Configurer le remote tracking pour voir toutes les branches distantes. Ajouter des worktrees pour chaque branche (git worktree add ). Avantages: Économie d'espace, source de vérité unique (un git fetch met tout à jour), hooks/configs partagés, sécurité. Conseils: Ne jamais faire de git checkout à l'intérieur d'un worktree. git fetch --all depuis n'importe quel worktree pour tout mettre à jour. git worktree add --detach pour tester des merges temporaires sans créer de branche. Supprimer: git worktree remove puis git worktree prune. Un script wtree est fourni pour automatiser l'initialisation du setup "bare". Améliore considérablement le workflow. L'IDE meurt et vite https://x.com/jdegoes/status/2036931874057314390?s=46&t=C18cckWlfukmsB_Fx0FfxQ Des leaders techniques prédisent la fin rapide de l'IDE traditionnel, remplacé par des interfaces conversationnelles agentiques Le changement de paradigme : le développeur n'écrit plus des lignes de code mais exprime son intention et supervise des agents autonomes Des outils comme Claude Code, Copilot et Cursor transforment déjà radicalement les workflows de développement quotidiens L'IDE centré sur l'éditeur de code perd sa raison d'être quand l'agent lit, modifie et structure le code de manière autonome La transition est comparable au passage du desktop au mobile : les pratiques établies depuis 30 ans remises en question en quelques mois Le source de Claude Code a leaké via probablement le codemap et un site decrit sont fonctionnement https://ccunpacked.dev/ Le 31 mars 2026, Anthropic a accidentellement inclus les sourcemaps dans un package npm de Claude Code, exposant ~512 000 lignes de TypeScript La fuite n'était pas un piratage mais une erreur humaine : un "*.map" oublié dans .npmignore Le site ccunpacked.dev a été lancé pour analyser et visualiser le code source décompressé Le code révèle un agent background permanent nommé "KAIROS", un mode furtif pour cacher les contributions des employés Anthropic à l'open source, et 44 feature flags cachés Une fonctionnalité inédite "Buddy" (animal de compagnie électronique dans le terminal) et un mode "dream" pour l'idéation continue ont été découverts Anthropic a confirmé : "Aucune donnée client sensible n'était impliquée. Erreur humaine dans le packaging de la release." Gemini CLI passe aux agents https://x.com/srithreepo/status/2039794081925382307?s=46&t=GLj1NFxZoCFCjw2oYpiJpw Gemini CLI, l'agent IA open source de Google pour le terminal, introduit des hooks dans sa boucle agentique Les hooks permettent d'exécuter des scripts automatiquement (scanners de sécurité, vérifications de conformité, logging) à chaque étape de l'agent Lancement de Gemini CLI GitHub Actions : un agent autonome pour les repositories qui peut exécuter des tâches de codage de routine Support des MCP servers pour étendre les capacités et des "Agent Skills" pour des workflows spécialisés Mode agent disponible dans VS Code et IntelliJ avec accès aux outils du système de fichiers et terminal Wispr, le speech to text en local sur macOS http://wispr.stormacq.com/ Wispr est une application macOS de dictée vocale entièrement locale, propulsée par Whisper (OpenAI) sur appareil, sans cloud ni tracking Sébastien Stormacq a développé Wispr en un jour et demi sans écrire une seule ligne de code, grâce à Kiro CLI (agent IA Amazon) Disponible en open source sur GitHub et via Homebrew Détection automatique de la langue, insertion du texte au curseur dans n'importe quelle application via un raccourci global En un mois : 19 releases incluant mode mains-libres, suppression des mots de remplissage, auto-envoi pour les chats, et un outil CLI Exemple concret de développement vibe coding produisant un outil de qualité production sans expertise Swift préalable Comment, Gordon, l'assistant spécialisé en Docker est né https://n9o.xyz/posts/202603-building-gordon/ Nuno Coração (n9o.xyz) détaille comment Gordon, l'assistant spécialisé Docker, a été construit sur docker-agent, le runtime d'agents IA open source de Docker écrit en Go Les agents sont définis en YAML déclaratif et distribués comme des artefacts OCI, sans mise à jour binaire nécessaire L'architecture initiale en essaim de 9 agents spécialisés a été abandonnée au profit d'un agent racine unique avec un prompt soigneusement conçu Le modèle utilisé est Claude Haiku 4.5, suffisant après optimisation des prompts Principe clé "show, then do" : toute action de l'agent nécessite une approbation explicite de l'utilisateur La description des outils impacte fortement la précision du LLM : ajouter des outils peut paradoxalement dégrader les performances existantes Le prompt est une spécification détaillée (identité, patterns d'accès fichiers, règles de sécurité) plutôt qu'une simple instruction IBM Bob https://bob.ibm.com/blog/announcing-ibm-bob-launch IBM Bob assistant IA d'IBM pour coder sur de vraies codebases (lancé avril 2026) 5 modes : Ask, Plan, Code, Advanced (MCP), Orchestrator Détecte la complexité du code en temps réel et propose des refactos Fait des revues de code automatiques sur tes branches/issues GitHub Permet d'écrire en langage naturel directement dans l'éditeur Fonctionne aussi en terminal/CLI et dans les pipelines CI/CD Sécurité : approbation manuelle, .bobignore, checkpoints, pas de training sur tes prompts How I use Claude - 50 tips pratiques https://www.youtube.com/watch?v=mZzhfPle9QU Staff Engineer Meta partage 50 tips après 6 mois d'utilisation intensive de Claude Code Basé sur ~12h/jour d'usage perso et professionnel Couvre tout : bases, workflows avancés, parallélisation Objectif : partager ce qu'il aurait voulu savoir dès le départ Méthodologies Quelqu'un rale sur la non soutenabilité des bases de code écritent avec des agents https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/ Mario Zechner estime que les agents IA font les mêmes erreurs répétitivement sans apprendre, accumulant la complexité à grande vitesse faute de bottlenecks humains Sans vision globale, les agents créent du cargo-cult : les "best practices" de l'industrie appliquées localement sans cohérence architecturale La croissance de la base de code dégrade la capacité des agents à retrouver le code existant → duplication et incohérences croissantes Il cite des pannes AWS et des initiatives qualité Microsoft comme signes préoccupants liés au code généré par IA Solution : réserver les agents aux tâches délimitées et évaluables, garder l'architecture, les APIs et les systèmes critiques écrits à la main Maintenir une revue de code rigoureuse et traiter les humains comme les gardiens finaux de la qualité On m'oblige à utiliser l'IA https://n.survol.fr/n/on-moblige-a-utiliser-lia Éric D. défend l'adoption obligatoire de l'IA comme décision stratégique légitime, comparable au choix du full remote ou de la stack technique Il distingue la décision stratégique (adoption IA) de la méthode d'accompagnement (qui reste collaborative et bienveillante) La compétence IA devient un critère de recrutement : chercher des candidats déjà curieux et explorateurs de ces outils L'alignement culturel sur les pratiques et outils est un prérequis à la cohésion d'équipe Le refus d'adopter certains outils stratégiques peut justifier de ne pas recruter un candidat autrement compétent Encore une metodo SPDD https://martinfowler.com/articles/structured-prompt-driven/ Problème : l'IA accélère le dev individuel mais amplifie ambiguïtés et incohérences à l'échelle d'une équipe. martinfowler SPDD : traiter les prompts comme des artefacts versionnés, révisables et réutilisables plutôt que des échanges jetables. martinfowler Canvas REASONS : 7 dimensions (Requirements, Entities, Approach, Structure, Operations, Norms, Safeguards) pour guider le LLM de l'intention à l'exécution. martinfowler Workflow en 6 étapes : exigences → analyse → contexte → prompt structuré → code → tests unitaires, chaque étape s'appuyant sur la précédente. martinfowler 3 compétences clés : abstraction d'abord, alignement de l'intention, revue itérative. martinfowler Limites : fort ROI sur du code métier complexe, peu adapté aux hotfixes urgents, scripts jetables ou travail créatif/visuel. m Sécurité Le projet Glasswing pour sécuriser les logiciels https://www.anthropic.com/glasswing Anthropic lance Glasswing, une initiative de cybersécurité utilisant Claude Mythos Preview pour identifier des vulnérabilités zero-day 12 partenaires fondateurs dont AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft et NVIDIA Anthropic investit 100 millions de dollars en crédits de modèle et 4 millions en dons aux organisations de sécurité open source Le modèle opère avec une autonomie substantielle, identifiant des milliers de vulnérabilités dans les OS, navigateurs et infrastructures critiques Plus de 40 organisations supplémentaires ont accès pour scanner et sécuriser leurs systèmes Objectif : donner l'avantage aux défenseurs avant que les techniques de hacking assistées par IA ne se généralisent chez les attaquants LinkedIn vous espionne https://frenchbreaches.com/blog/linkedin-est-accuse-de-fouiller-dans-votre-ordinateur-illegalement Scandale "BrowserGate" : LinkedIn injecte du JavaScript qui tente de détecter les extensions Chrome installées sur votre navigateur Le script analysé contient une liste codée en dur de 6 222 extensions Chrome avec identifiants et chemins de fichiers internes Croissance alarmante de la liste ciblée : 38 extensions en 2017 → 461 en 2024 → ~1 000 en mai 2025 → 6 222 début 2026 Les données collectées incluent aussi CPU, RAM, résolution d'écran, timezone et état batterie pour du fingerprinting Certaines extensions ciblées sont liées à la neurodivergence, aux pratiques religieuses ou aux opinions politiques → violation grave du RGPD LinkedIn défend que le scan vise uniquement à détecter les extensions qui pratiquent le scraping de données Post mortem de la supply chain attack sur la librairie NPM axios https://github.com/axios/axios/issues/10636 Le 31 mars 2026, deux versions malveillantes d'axios (1.14.1 et 0.30.4) ont été publiées via un compte mainteneur compromis Vecteur d'attaque : RAT installé via ingénierie sociale ciblée sur la machine personnelle du mainteneur principal La 2FA ne protège pas si la machine de l'utilisateur est compromise : l'attaquant contrôle tout et peut agir comme l'utilisateur Les packages malveillants injectaient plain-crypto-js@4.2.1, un cheval de Troie multi-plateforme (macOS, Windows, Linux) Détection communautaire en ~3 heures, suppression par npm, mesures correctives : rotation complète des credentials Changements préventifs : publication via OIDC, releases immuables, amélioration des pratiques GitHub Actions Passbolt un gestionnaire de mots de passe open source https://lesjoiesducode.fr/passbolt-gestionnaire-de-mots-de-passe-gratuit-open-source-que-votre-equipe-merite-vraiment Gestionnaire de mots de passe open source conçu pour le partage d'identifiants en équipe, utilisé par plus de 50 000 organisations Chiffrement individuel par utilisateur et par version de credential, pas de coffre-fort partagé — architecture zero-knowledge "Forward secrecy" : quand un membre quitte l'équipe, ses copies chiffrées sont automatiquement révoquées sans reset manuel Supporte TOTP, clés SSH, tokens API et champs personnalisés avec piste d'audit complète de tous les accès Édition communautaire entièrement gratuite avec utilisateurs illimités, auto-hébergeable ou cloud Chiffrement OpenPGP nécessitant passphrase + clé privée, avec tokens visuels anti-phishing Loi, société et organisation Anthropic fait un don d'1,5 millions de dollars à la fondation Apache https://news.apache.org/foundation/entry/the-apache-software-foundation-announces-1-5m-donation-from-anthropic Anthropic donne 1,5 million de dollars à l'ASF pour soutenir l'infrastructure, la sécurité et la communauté open source Vitaly Gudanets (CISO d'Anthropic) : "Soutenir l'ASF est un investissement direct dans la résilience et l'intégrité des systèmes dont dépend l'IA moderne" Les fonds financeront les systèmes de build, les processus de sécurité et les services aux projets Apache Ce don est le déclencheur de l'initiative IA responsable à 10 millions de dollars de l'ASF L'infrastructure Apache est invisible mais critique : des systèmes financiers aux plateformes de santé, elle sous-tend l'écosystème logiciel mondial L'ASF lance l'initiative IA responsable https://news.apache.org/foundation/entry/the-apache-software-foundation-launches-10m-responsible-ai-initiative-with-initial-1-75m-donation L'ASF lance une initiative pour une IA responsable dotée d'un budget de 10 millions de dollars sur 3 ans minimum Anthropic est le premier donateur avec 1,5 million de dollars ; Alpha-Omega contribue 250 000 dollars L'initiative fournit aux projets Apache un accès à des modèles IA pour l'expérimentation et la sécurité Elle soutient l'ensemble de la chaîne IA/ML : pipelines de données, infrastructure, frameworks de deep learning Des tracks de conférences, hackathons et bourses de voyage sont prévus pour élargir la communauté Les principes directeurs incluent la supervision humaine, l'intégrité des licences et la sécurité open source Oracle vire 30000 personnes https://rollingout.com/2026/03/31/oracle-slashes-30000-jobs-with-a-cold-6/ Oracle licencie 20 000 à 30 000 employés, 18% de ses effectifs mondiaux. Les salariés ont appris leur licenciement par un simple email à 6h du matin, sans aucun préavis. L'accès à tous les systèmes (Slack, Zoom, badges) a été coupé immédiatement après. But : libérer 8 à 10 milliards de dollars pour construire des centres de données IA. Oracle a déjà contracté 50 milliards de dettes en 2026 pour financer ses projets IA. Paradoxe : l'entreprise affiche un bénéfice record de 6,13 milliards, mais ses liquidités sont dans le rouge. L'action Oracle a perdu plus de la moitié de sa valeur depuis septembre 2025. Et si l'IA n'était qu'un prétexte pour licencier https://eventuallycoding.com/p/ia-licenciements-et-si-l-intelligence-artificielle-n-etait-qu-une-excuse Hugo Lassiège (eventuallycoding) estime que les entreprises utilisent l'IA comme narratif commode pour masquer des erreurs de gestion passées (Block a triplé ses effectifs post-COVID sans croissance des revenus correspondante) Moins de 1% des licenciements technologiques seraient réellement dus à des gains de productivité IA selon les analyses citées Mesurer la productivité des développeurs reste un problème non résolu, mais les entreprises affirment des gains d'efficacité sans preuves Des pressions économiques réelles (inflation, guerres commerciales, coûts énergétiques) sont masquées derrière le discours IA Les restructurations nécessaires sont présentées comme des transformations AI-driven positives pour rassurer les investisseurs Il y voit une fenêtre d'opportunité pour l'Europe pendant que les géants américains se restructurent GitHub Copilot va utiliser les interacitons pour entrainer ses modèles sauf si vous vous délistez https://github.blog/news-insights/company-news/updates-to-github-copilot-interaction-data-usage-policy/ À partir du 24 avril 2026, GitHub utilise par défaut les interactions des utilisateurs Copilot Free, Pro et Pro+ pour entraîner ses modèles Les données collectées incluent le code accepté ou modifié, les snippets envoyés, les noms de fichiers et structures de dépôts, et les retours utilisateurs Les utilisateurs Copilot Business, Enterprise et les dépôts d'entreprise sont exclus de cette collecte de données d'entraînement Opt-out disponible dans les paramètres GitHub > "Privacy" ; les préférences de désactivation préalables sont conservées automatiquement Objectif déclaré : améliorer la précision des modèles sur les langages et cas d'usage du monde réel Grosse percée de Claude Code dans les commits sur GitHub https://aifoc.us/damn-claude-thats-a-lot-of-commits/ Explosion de Claude Code : En six mois, Claude Code est passé de 0,7 % à 4,5 % de tous les commits publics sur GitHub, surpassant tous les autres outils d'IA combinés. Adoption massive des agents IA : Environ 5 % des commits publics sur GitHub sont désormais générés par des agents IA, un chiffre en croissance rapide depuis fin 2025. Domination des bots sur GitHub : Au-delà des commits, les outils d'IA sont omniprésents dans la gestion des pull requests et des problèmes (Copilot et CodeRabbit notamment). Limites méthodologiques : Les données ne concernent que les dépôts publics (les entreprises utilisent massivement des dépôts privés, invisibles ici). Le comptage dépend fortement de la visibilité des signatures (certains outils comme Claude marquent systématiquement leurs commits, d'autres non) L'API de recherche GitHub présente une fiabilité variable à cette échelle. Changement de paradigme : Le développement logiciel vit une transition majeure, comparable au passage du desktop au mobile. L'intégration des agents IA dans le cycle de production n'est plus une expérimentation, mais une réalité opérationnelle à grande échelle. Dysmaths une application pour aider à apprendre les mathématiques et la géométrie lorsque l'on souffre de dyspraxie, dysgraphie https://dysmaths.com/ Application web pour aider les élèves de collège et lycée souffrant de dysgraphie et dyspraxie à faire des maths et de la géométrie Outils de dessin à main levée, géométrie précise (compas, rapporteur, règle) et opérations structurées (fractions, racines, puissances, symboles mathématiques) Export PDF et PNG avec conservation fidèle de l'échelle pour l'impression et la soumission des exercices Options d'accessibilité : police OpenDyslexic, personnalisations d'interface, import d'images et de PDFs Répond à un besoin réel : les outils standards ne sont pas adaptés aux difficultés de coordination et d'organisation spatiale en mathématiques IA ou réalité ? Par Amistory https://www.youtube.com/watch?v=PPYdAhBBF2I L'IA génère des contenus (images, voix, vidéos) de plus en plus indétectables Les arnaques au clonage de voix et deepfakes sont en forte hausse Les faux contenus viraux manipulent l'opinion à grande échelle Le faux n'est plus un accident, c'est devenu un système organisé La société entre dans une ère de doute généralisé sur le réel Comment s'informer quand le réel lui-même peut être simulé ? Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 6-7 mai 2026 : Devoxx UK 2026 - London (UK) 12 mai 2026 : Lead Innovation Day - Leadership Edition - Paris (France) 12-13 mai 2026 : Lyon Craft - Lyon (France) 19 mai 2026 : La Product Conf Paris 2026 - Paris (France) 19-20 mai 2026 : Green Code Challenge - Paris (France) 21-22 mai 2026 : Flupa UX Days 2026 - Paris (France) 22 mai 2026 : AFUP Day 2026 Lille - Lille (France) 22 mai 2026 : AFUP Day 2026 Paris - Paris (France) 22 mai 2026 : AFUP Day 2026 Bordeaux - Bordeaux (France) 22 mai 2026 : AFUP Day 2026 Lyon - Lyon (France) 27 mai 2026 : aMP Day Strasbourg 2026 - Strasbourg (France) 28 mai 2026 : DevCon 27 : I.A. & Vibe Coding - Paris (France) 28 mai 2026 : Cloud Toulouse 2026 - Toulouse (France) 29 mai 2026 : NG Baguette Conf 2026 - Paris (France) 29 mai 2026 : Agile Tour Strasbourg 2026 - Strasbourg (France) 2-3 juin 2026 : Agile Tour Rennes 2026 - Rennes (France) 2-3 juin 2026 : OW2Con - Paris-Châtillon (France) 3 juin 2026 : IA–NA - La Rochelle (France) 4 juin 2026 : Workplace Intelligence Days - 1ère édition - Lyon (France) 5 juin 2026 : TechReady - Nantes (France) 5 juin 2026 : Fork it! - Rouen - Rouen (France) 6 juin 2026 : Polycloud - Montpellier (France) 9 juin 2026 : JFTL - Montrouge (France) 9 juin 2026 : C: - Caen (France) 9 juin 2026 : France API 2026 - Paris (France) 11-12 juin 2026 : DevQuest Niort - Niort (France) 11-12 juin 2026 : DevLille 2026 - Lille (France) 12 juin 2026 : Tech F'Est 2026 - Nancy (France) 15 juin 2026 : Jupyter Workshops: Demystifying MyST Markdown in Education - Orsay (France) 16 juin 2026 : Mobilis In Mobile 2026 - Nantes (France) 17-19 juin 2026 : Devoxx Poland - Krakow (Poland) 17-20 juin 2026 : VivaTech - Paris (France) 18 juin 2026 : Tech'Work - Lyon (France) 22-26 juin 2026 : Galaxy Community Conference - Clermont-Ferrand (France) 23-24 juin 2026 : MWCP 2026 - Paris (France) 24-25 juin 2026 : Agi'Lille 2026 - Lille (France) 24-26 juin 2026 : BreizhCamp 2026 - Rennes (France) 25-26 juin 2026 : Agile Tour Toulouse 2026 - Toulouse (France) 27 juin 2026 : Asynconf - Paris (France) 2 juillet 2026 : Azur Tech Summer 2026 - Valbonne (France) 2-3 juillet 2026 : Sunny Tech - Montpellier (France) 3 juillet 2026 : Agile Lyon 2026 - Lyon (France) 6-8 juillet 2026 : Riviera Dev - Sophia Antipolis (France) 28-30 août 2026 : State of the Map - Champs-sur-Marne (France) 4 septembre 2026 : JUG Summer Camp 2026 - La Rochelle (France) 10-11 septembre 2026 : Nantes Craft - Nantes (France) 17 septembre 2026 : dotAI - Paris (France) 17-18 septembre 2026 : API Platform Conference 2026 - Lille (France) 18 septembre 2026 : dotJS - Paris (France) 18 septembre 2026 : WordCamp Bretagne - Rennes (France) 22 septembre 2026 : Salon Data 2026 - Nantes (France) 22-23 septembre 2026 : Agile en Seine & IA 2026 - Paris (France) 24 septembre 2026 : OWASP AppSec Days France 2026 - Paris (France) 24 septembre 2026 : PlatformCon Paris - Paris (France) 24 septembre 2026 : React Native Connection 2026 - Paris (France) 24-26 septembre 2026 : Paris Web 2026 - Paris (France) 28-29 septembre 2026 : 4th Tech Summit on AI & Robotics - Paris (France) & Online 1 octobre 2026 : WAX 2026 - Marseille (France) 1-2 octobre 2026 : Volcamp - Clermont-Ferrand (France) 2 octobre 2026 : DevFest Perros-Guirec 2026 - Perros-Guirec (France) 5-9 octobre 2026 : Devoxx Belgium - Antwerp (Belgium) 12 octobre 2026 : Dev With AI - Paris (France) 27-29 octobre 2026 : Directions EMEA 2026 - Paris (France) 29-30 octobre 2026 : BDX I/O 2026 - Bordeaux (France) 30 octobre 2026 : Cloud Nord 2026 - Lille (France) 4-5 novembre 2026 : Devoxx Morocco - Casablanca (Morocco) 14-15 novembre 2026 : Capitole du Libre - Toulouse (France) 19 novembre 2026 : DevFest Toulouse 2026 - Toulouse (France) 27 novembre 2026 : DevFest Paris 2026 - Paris (France) 1-3 décembre 2026 : Apidays Paris - Paris (France) 4 décembre 2026 : DevFest Lyon 2026 - Lyon (France) 4 décembre 2026 : DevFest Dijon 2026 - Dijon (France) 9-10 décembre 2026 : OpenSource Expérience - Paris (France) 9-10 décembre 2026 : DevOps REX - Paris (France) 10 décembre 2026 : KCD Provence - Aix-en-Provence (France) 7-9 avril 2027 : Devoxx France 2027 - Paris (France) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/

covid-19 netflix ai google apple france state zoom spring microsoft plan code human silicon valley services forward os ga operations options app adoption roi dans structure construction windows context ip architecture oracle application obstacles enterprise ram ia buddy swift verse slack faire requirements explosion blue sky index api milestone rat conf cisco agile clips io chrome bon encore explicit python aws nouvelle nouveau domination ml trois java github guillaume fork mythos workflow int apis aur probl helm criteria limites llm chorus copilot moins javascript macos kafka apache anthropic nouvelles contr gestion grosse cas norms gpu wax changement cpu flexibilit propose nouveaux hotspot gc entities safeguards crowdstrike slogan vert kairos transactional certaines opt codex objectif docker principe loi git kubernetes utiliser m2 png plugins lancement deepmind croissance outils aucune chansons enregistr mcp approche erreur quelqu changements ci cd cursor json london uk cli avantages terraform paris france mysql typescript github copilot vms fonctionne graphql lier utilisation ssh vs code paradoxe maintenir npm capitole redis linux foundation orm postgresql mesurer librairie sql server supprimer sse prochaines alpha omega ansible jep jvm vache contrats oci lts alignement hibernate yann lecun troie ajouter trivago yaml ddl gestionnaire a2a grpc gitops tech summit mariadb devcon facilite compaction spring boot personnalisation josh long community edition lyon france intellij protocoles adk lyria openjdk rc1 inclure glasswing bordeaux france jpa spring framework cloner chiffrement testcontainers provence france jeps oidc strasbourg france toulouse france firestore lille france pgo kafka connect spring data dijon france amazon efs devoxx france
AI Inside
The Messy Web of AI Guardrails

AI Inside

Play Episode Listen Later May 7, 2026 78:46


This week Jason Howell and Jeff Jarvis unpack a surprising policy reversal from the White House, which is now weighing a plan to vet frontier AI models before release after revoking Biden's safety order on day one. They also dig into the Musk versus OpenAI trial testimony, where $80 billion Mars requests and near-physical confrontations made it feel more like a soap opera than a courtroom.Also in this episode: OpenAI traced a goblin obsession back to a rogue reward signal, Yann LeCun told students to ignore AI doom narratives, Gary Marcus called out Richard Dawkins for declaring Claude is conscious, a researcher invented a fake disease that every AI treated as real, and in the speed round, Nvidia's China market share hit zero, Anthropic launched a $1.5 billion enterprise joint venture, Google split its TPU into two chips, and China made it illegal to fire workers over AI. New episodes every Wednesday at aiinside.show. Note: Time codes subject to change depending on dynamic ad insertion by the distributor. CHAPTERS:  0:00 - Start 0:04:00 - White House Considers Vetting A.I. Models Before They Are Released Behind the White House's Potential Rethink on A.I. 0:13:51 - Canadian fiddler sues Google after AI Overview wrongly claimed he was a sex offender What Was Discussed at Google's White House Meeting About A.I. 0:22:32 - Top AI companies agree to work with Pentagon on secret data 0:24:09 - OpenAI's president does ‘all the things,' except answer a question 0:31:32 - Where the goblins came from (OpenAI) 0:36:29 - Video explaining JEPA with LeCun 0:42:50 - AI godfather Yann LeCun's blunt advice for the AI age 0:44:47 - Will A.I. Make College Obsolete? 0:48:51 - Marcus: Richard Dawkins and The Claude Delusion 0:55:29 - Scientists invented a fake disease. AI told people it was real 1:02:37 - Jensen says Nvidia now has 'zero percent' market share in China — says US export policy 'has already largely backfired' 1:04:40 - Anthropic Unveils AI Agents to Field Financial Services Tasks 1:06:12 - New: Higher usage limits for Claude and a compute deal with SpaceX 1:08:33 - Google's eighth generation TPUs: two chips for the agentic era 1:10:25 - China makes it illegal to fire humans if AI takes their jobs Hosts: Jason Howell and Jeff Jarvis  Download and subscribe to AI Inside in audio and video: https://aiinside.show/  Support the podcast on Patreon for special perks: https://www.patreon.com/aiinsideshow. You'll get ad-free episodes, members-only Discord, T-shirts and stickers you love, and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Learn more about your ad choices. Visit megaphone.fm/adchoices

Business Without Bullsh-t
Why your boss is the real AI threat - Dave Birss

Business Without Bullsh-t

Play Episode Listen Later Apr 29, 2026 98:50 Transcription Available


Dave Birss says you won't be replaced by AI - you'll be replaced by a leader who's been told the wrong story about it.About this episodeDave Birss is back on Business Without BS - author of the Sensible AI Manifesto, co-founder of the Gen AI Academy, and a man who's taught a million-and-a-half people how to use AI without setting their business on fire. He walks Andy and Andrew through what he calls a "corporate poopocalypse" — what happens when you apply AI to a business that hasn't cleaned up its own mess.The episode covers the Sensible AI Manifesto's six points, the CREATE prompting framework, the three Cs for checking AI output, the adequacy trap, why judgment is the most undervalued skill of the next decade, and the practical playbook for rolling out AI across a team without sending the whole organisation into a panic.About the guestDave Birss co-founded the Gen AI Academy with Helena, where they run AI training across governments, the UN, and Fortune 500 companies. He wrote the Sensible AI Manifesto and GPT Junior, the kids' AI book and video course now in over 100 schools. Before all that he spent his career in advertising and creativity, which is where most of his frameworks come from.Key moments[02:46] The Roomba poopocalypse - why AI applied to a dysfunctional business spreads the mess, not the productivity.[05:46] Corporate barnacles - the institutional plaque costing every business 40% in fuel and speed.[08:04] Sensible AI Manifesto Point 1: use AI to augment skills, not to outsource tasks.[09:15] The two-list exercise: tasks that piss you off vs tasks you wish you could do more of. Only the second list is the real opportunity.[12:11] AI slap - 96% of leaders think AI raises productivity, 77% of staff feel buried by unrealistic expectations.[13:48] The adequacy trap - why AI users get stuck at "good enough" and never break through.[22:51] The other five Manifesto points: use data responsibly, support employees, assign AI leaders, keep learning, always add a human layer.[26:40] The CREATE prompting framework — Character, Request, Examples, Adjustments, Type, Extras.[37:59] The three Cs for checking AI output: Confirm, Check, Craft. Why most people skip the third one.[55:14] How business owners keep their thinking sharp: do the work on paper before you open the laptop.[1:01:03] What humans still beat AI at - conceptualisation, creative voice, and judgment. The judgment one matters most.[1:14:17] The line that pisses Dave off: "you won't be replaced by AI, you'll be replaced by someone using AI." His correction is sharper.[1:18:09] The three-stage AI value pyramid — cost cutting → skill amplification → unlocking what wasn't possible before. 80% of companies are stuck on stage one.[1:24:18] How to roll out AI across a team in an afternoon: align with business strategy, declare an AI amnesty, pave the desire lines.Mentioned in this episodeSensible AI Manifesto — Dave's six-point framework for applying AI without breaking your business. Currently being turned into a book.Gen AI Academy - the training company Dave co-founded with Helena, working with governments, the UN and Fortune 500s.GPT Junior - Dave's book and video course teaching kids how to use AI properly, currently in over 100 schools.Perplexity - Dave's preferred AI tool for fact-checking because it gives you the sources.Cal Newport - referenced for the long-form-reading argument and the case that children reading for pleasure is the strongest predictor of life outcomes.Range (David Epstein) - the case for generalists over hyper-specialists; Dave says the book describes him.Yann LeCun - recently left Meta over the limits of next-token prediction; arguing AI needs world models, not just language.Roomba poopocalypse - the family-and-the-dog metaphor that opens the episode and frames the whole thing.Marc Andreessen / lump of labour fallacy — the framing for why we systematically underestimate the new jobs that emerge from disruption.RAF desire lines - the Nissan-hut path-paving story; Dave's metaphor for letting staff show you how AI is already being used.Combinedly - the AI tool Andrew's firm is testing for client-sentiment analysis and email drafting.Find the guestLinkedIn: https://www.linkedin.com/in/davebirss/ Gen AI Academy: https://thegenaiacademy.com/Follow Business Without BSWebsite: https://withoutbs.comYouTube: https://youtube.com/@bwblondonInstagram: https://instagram.com/bwblondonX / Twitter: https://x.com/bwb_londonLinkedIn: https://www.linkedin.com/company/business-without-bs

Choses à Savoir TECH
Meta vole les employés d'une startup qui refuse un rachat ?

Choses à Savoir TECH

Play Episode Listen Later Apr 27, 2026 2:24


La Silicon Valley est entrée dans une nouvelle phase : celle d'une guerre des talents autour de l'intelligence artificielle. Et dans cette bataille, Meta semble prête à aller très loin. Dernier épisode en date : l'affaire Thinking Machine Labs. Cette start-up, fondée début 2025 par Mira Murati, s'est rapidement imposée comme un acteur prometteur. Elle développe des systèmes d'IA dits « multimodaux », capables de traiter simultanément du texte, des images ou encore du son. Sa valorisation aurait déjà atteint 12 milliards de dollars, avec des projections bien plus élevées.Face à ce potentiel, Meta a tenté une première approche classique : une offre de rachat estimée à un milliard de dollars. Refus net de Mira Murati. L'entreprise de Mark Zuckerberg a alors changé de méthode. Plutôt que d'acheter la société, elle a entrepris de recruter directement ses talents clés. Résultat : plusieurs membres fondateurs ont quitté Thinking Machine Labs pour rejoindre les équipes de Meta. Parmi eux, l'ingénieur Andrew Tulloch, avec un package de rémunération estimé à 1,5 milliard de dollars sur six ans, un montant inédit dans l'histoire de la tech. D'autres profils, comme Barret Zoph ou Luke Metz, ont quant à eux choisi de revenir chez OpenAI. Pour la jeune pousse, le choc est brutal. Une partie importante de son équipe fondatrice a disparu en quelques mois, obligeant Mira Murati à reconstruire son organisation. Cette stratégie illustre une tendance plus large. Les géants de la tech rivalisent d'offres pour attirer les meilleurs chercheurs en IA. Chez Google DeepMind, par exemple, des clauses de non-concurrence permettent de retenir les talents pendant plusieurs mois, tout en continuant à les rémunérer. Du côté d'OpenAI, des primes à la signature pouvant atteindre 100 millions de dollars sont évoquées.Meta, de son côté, avance vite. L'entreprise a récemment lancé Muse Spark, son premier modèle multimodal développé par cette nouvelle équipe. Mais en interne, la transition est loin d'être fluide. Le départ de Yann LeCun, figure historique de l'IA chez Meta, en est un signal fort. Après douze ans au poste de directeur scientifique, il a quitté l'entreprise, critiquant ouvertement certaines orientations. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

AI Inside
John Ternus Takes Over Apple in the AI Era

AI Inside

Play Episode Listen Later Apr 22, 2026 77:48


Jason Howell and Jeff Jarvis cover the week's biggest AI and tech headlines, including Tim Cook stepping down as Apple CEO and John Ternus stepping in, Anthropic's surprise White House visit and a possible Pentagon deal, and SpaceX's $60 billion option to acquire AI coding tool Cursor.Also in this episode: Google's agentic enterprise push at Cloud Next, unauthorized access to Anthropic's restricted Mythos model, Yann LeCun's new world model paper, Deezer's AI-generated music stats, Claude Design from Anthropic Labs, Google Deep Research Max, and ChatGPT Images 2.0. Find the show at aiinside.show. Note: Time codes subject to change depending on dynamic ad insertion by the distributor. CHAPTERS: 0:00 - Start 0:07:39 - Google Cloud Pushes Hard on AI Agents and Hardcore Computing 0:15:00 - Google announces ‘Workspace Intelligence' and TPU 8t + 8i chips 0:19:04 - Apple turns to hardware veteran Ternus as CEO to succeed Cook in AI age 0:29:01 - White House and Anthropic Hold ‘Productive' Meeting, Aiming for a Compromise 0:35:50 - SpaceX is working with Cursor and has an option to buy the startup for $60 billion 0:40:20 - LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels 0:41:40 - Explanation of it 0:52:02 - Deezer says 44% of songs uploaded to its platform daily are AI-generated 1:00:59 - Introducing Claude Design by Anthropic Labs 1:08:58 - Deep Research Max: a step change for autonomous research agents 1:09:51 - OpenAI's updated image generator can now pull information from the web  Hosts: Jason Howell and Jeff Jarvis Download and subscribe to AI Inside in audio and video: https://aiinside.show/ Support the podcast on Patreon for special perks: https://www.patreon.com/aiinsideshow. You'll get ad-free episodes, members-only Discord, T-shirts and stickers you love, and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Learn more about your ad choices. Visit megaphone.fm/adchoices

Smart Driving Cars Podcast
Smart Driving Cars episode 411: Aurora, LeCun, Uber, AI & more

Smart Driving Cars Podcast

Play Episode Listen Later Apr 18, 2026 36:33


It's episode 411 of Smart Driving Cars with Princeton's Alain Kornhauser and co-host Fred Fishkin. In this edition, Aurora's Chris Urmson fields questions at the MIT Mobility Forum, AMI's Yann LeCun on campus in Princeton, Uber commits 10 billion to robotaxis, Axios reports mobility's new big three and more. Join Alain and Fred for the latest and subscribe!

Forbes Talks
This Google Spinout Thinks AI Can Fix America's EV Battery Problem

Forbes Talks

Play Episode Listen Later Apr 13, 2026 6:53


SandboxAQ has an AI platform to help materials researchers speed the development of safer, higher-powered, solid-state batteries for autos, the military and data centers. China's dominance in batteries is powering a global auto industry shakeup. The country didn't just get better at making them. It got better at making a lot of them cheaply and fast enough to let automakers like BYD and Geely sell electric vehicles at prices that can look like a misprint next to U.S. and European models. Now, SandboxAQ, a moonshot company spun out of Google in 2022, is betting the U.S. doesn't need to win by outbuilding China cell-for-cell. It just needs to come up with better battery designs. And it says its AI-enabled tech platform can help battery scientists accelerate their research to create new types of safer, cheaper solid-state batteries for EVs, military equipment and data centers.  The Palo Alto, California-based company, which has raised $950 million from backers including Alphabet, Nvidia and AI scientist Yann LeCun, is today releasing a new version of its research platform, AQVolt26. The pitch: compress the earliest, most uncertain part of battery R&D—screening and evaluating candidate materials—so scientists can dump bad ideas quickly and focus their efforts on the ones that might actually ship. The goal is to slash development time to create new battery chemistries, which now takes 10 to 15 years, said Ang Xiao, who leads SandboxAQ's materials science team. “It's hard to give an exact figure for how many years we can save, but I can tell you that for the discovery phase, we can reduce the time of that by 90% to 95%,” he told Forbes. “Our technology is only focused on the discovery phase, phase one. … But in the end, we will accelerate the entire development pipeline.” The company, chaired by former Google CEO Eric Schmidt, says it's already generating revenue from its tech from customers, including battery developer Novonix and the U.S. Army, as well as other battery and auto companies it declined to name. It also won't say how much revenue it expects this year. SandboxAQ's battery strategy is to make money from fees paid by users of its research platform, licensing its tech to other companies or doing research on their behalf, as well as developing its own unique battery materials. With demand rising for batteries across EVs, energy and grid storage and defense applications, it's chasing a market with real money behind it. “We see the battery market as a $500 billion opportunity this decade, expanding toward $1 trillion as electrification and AI-driven energy demand accelerate,” Xiao said. “Our focus is on the high-value segment of materials discovery and performance optimization.” Like Waymo, another Google Moonshot, Sandbox is using AI for physical applications rather than chatbots. In addition to battery tech, which is part of its chemicals and materials unit, it's also focused on using AI for drug discovery and medical diagnostics, among other areas. Unlike OpenAI and Google's Gemini, which lean on large language models (LLMs), Sandbox says its approach is built on large quantitative models (LQMs) trained on physics-based data and scientific principles. Read the full story on Forbes: By Alan Ohnsman https://www.forbes.com/sites/alanohnsman/2026/04/07/this-google-spinout-thinks-ai-can-fix-americas-ev-battery-problem/ Learn more about your ad choices. Visit megaphone.fm/adchoices

SparX by Mukesh Bansal
The Dark Side of AI No One Talks About | Mukesh Bansal | Connor Leahy | SparX

SparX by Mukesh Bansal

Play Episode Listen Later Apr 4, 2026 65:22


What happens when the most powerful technology ever built is also the one nobody fully understands - including its creators?Connor Leahy, US Director of Control AI and one of the clearest thinkers in the AI safety space, joins Mukesh Bansal for a conversation that cuts through the hype and lands somewhere far more unsettling: we may have two to five years before AI systems cross a threshold we cannot reverse.This isn't a doom scroll. It's a strategic briefing.In this episode of SparX, Connor breaks down why the race to superintelligence is a national security issue, not a technology one and why the people building it aren't villains, they're just operating inside a system with no brakes. He challenges Yann LeCun's repeated prediction that we're nowhere close, explains why Dario Amodei's admission that we understand only ~3% of how AI works should terrify us, and unpacks why Sam Altman and the labs he leads are racing toward a goal they cannot fully control. He also explains why the "we'll run out of data" argument keeps being wrong, how AI systems are now learning by interacting with environments (just like humans do), and why the moment superintelligence arrives, we probably won't recognise it.We also ask: What can India and other middle powers actually do? Why did the climate movement fail? What must the AI safety movement learn from that? Is Senator Blackburn's Trump AI Act a sign that Washington is finally waking up? And with Bernie Sanders and AOC now speaking out, could AI safety become a defining issue in the 2026 elections?Plus - in a first for the podcast - four frontier AI models (Claude, GPT, Gemini, and Grok) listen live to the conversation and jump in with questions. The results are equal parts fascinating and telling.Guest: Connor Leahy : AI Safety Researcher | Co-founder of EleutherAI | US Director at Control AI

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 2, 2026 66:47


We've been on a bit of a mini World Models series over the last quarter: from introducing the topic with Yi Tay, to exploring Marble with World Labs' Fei-Fei Li and Justin Johnson, to previewing World Models learned from massive gaming datasets with General Intuition's Pim de Witte (who has now written down their approach to World Models with Not Boring), to discussing the Cosmos World Model with with Andrew White of Edison Scientific on our new Science pod, to writing up our own theses on Adversarial World Models. Meanwhile Nvidia, Waymo and Tesla have published their own approaches, Google has released Genie 3, and Yann LeCun has raised $1B for AMI and published LeWorldModel.Today's guests have a radically different approach to World Modeling to every player we just mentioned — while Genie 3 is impressive, its many flaws demonstrate the issues with their approach - terrain clipping, noninteractivity (single player, no physics/no objects other than the player move), and maximum of 60 second immersion. Moonlake AI (inspired by the Dreamworks logo) is the diametric opposite - immediately multiplayer, incredibly interactive, indefinite lifetime, capable of MANY different kinds of world models by simulating environments, predicting outcomes, and planning over long horizons. This is enabled by bootstrapping from game engines and training custom agents: In Towards Efficient World Models, Chris Manning and Ian Goodfellow join Fan-Yun in explaining why their approach to efficiency with structure and casuality instead of just blind scaling is sorely needed:SOTA models still show physical or spatial understanding glitches, such as solid objects floating in mid-air or moving “inside” other solid objects.If the goal is to plan for the next action, how often is a high-resolution pixel view necessary for modeling the world? Our bet is that there is a disproportionately large share of economically valuable tasks where such detail is not required. After all, humans with a wide variety of sensory limitations have little difficulty doing almost everything in the world. Furthermore, for a large number of purposes, describing a scene or a situation in a few words of language (“the car's tires squealed as it cornered sharply”) is sufficient for understanding and planning.Experiments also show that humans only partially process visual input in a top-down, task-directed way, often making use of abstracted object-level modeling. In almost all cases, partial representations combined with semantic understanding are sufficient.…If the goal is to facilitate the understanding of causality in multimodal environments, then the world model—whether it is used in the virtual world or the physical world—must prioritize properties such as spatial and physical state consistency maintained over long time periods, and an ability to evolve the world that accurately reflects the consequences of actions. That's what Moonlake is building.Game engines are the right starting point abstraction to efficiently extract causal relationships, and building the interfaces and community (including their new $30,000 Creator Cup) to kickstart the flywheel of actions-to-observations.We were fortunate enough to attend their sessions at GDC 2026 (the Mecca of Game Devs), and were impressed by the huge variety and flexibility of the worlds people were building with Moonlake's tools already! Live videos on the pod.Full Video Pod on YouTube!Timestamps00:00 Benchmarking Gets Hard00:47 Meet Moonlake Founders01:26 Why Build World Models03:12 Structure Not Just Scale05:37 Defining Action Conditioned Worlds07:32 Abstraction Versus Bitter Lesson14:39 Language Versus JEPA Debate20:27 Reasoning Traces And Rendering Layer37:00 Gameplay Over Graphics38:02 Fiction Rules And World Tweaks39:15 Code Engines Beat Learned Priors41:10 Diffusion Scaling Limits43:23 Symbolic Versus Diffusion Boundary46:14 Platform Vision Beyond Games50:24 Spatial Audio And Multimodal Latents54:23 NLP Roots Hiring And Moon Lake NameTranscript[00:00:00] Cold Open[00:00:00] Chris Manning: Think this whole space is extremely difficult as things are emerging now. And I mean, it's not only for world models, I think it's for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks.[00:00:20] But these days so much of what people are wanting to do is nothing like that, right? You're wanting to get some recommendations about which backpack would be best for you for your trip in Europe next month. It's not so easy to come up with a benchmark, and it's the same problem with these world models.[00:00:41] Meet the Founders[00:00:41] swyx: Okay. We're back in the studio with Moon Lake's, two leads. I, I guess there's other founders as well, but, sun and Chris Manning. Welcome to the studio.[00:00:54] Fan-yun Sun: Thanks. Thanks, Chris. Thanks for having us.[00:00:56] swyx: You've got, you guys have, come burst onto the scene with a really refreshing [00:01:00] new take of mold models.[00:01:01] I would just want to, I guess ask how you, the two of you came together. Chris, you're a legend in NLP and just AI in, in, in general. You're, you're his grad student, I guess[00:01:10] Fan-yun Sun: Actually my co-founder.[00:01:11] swyx: Oh, yeah.[00:01:12] Fan-yun Sun: I should give a lot of credit to my co-founder, Sharon. Yeah. She was, she was actually working with Professor Fe Androgyn and then she ended up working with, Ron and Chris Manning here.[00:01:22] And then, so I got connected through to Chris initially, actually through my co-founder,[00:01:26] What is Moon Lake?[00:01:26] swyx: what is Moon Lake? What, what is, actually, I'm also very curious about the name, but like why going into world models?[00:01:33] Fan-yun Sun: So I was working a lot. With actually Nvidia research during my PhD years on essentially generating interactive worlds to train reinforcement learning agents or embody EA agents.[00:01:44] And then there's two observations. One in academia and one in industry. An industry like folks at Nvidia are actually paying a lot of dollars to purchase these types of interactive worlds, whether it's for the sake of evaluation or training the robots, or policies or models. And [00:02:00] then, in academia, same thing is happening.[00:02:02] And more specifically, when I was actually working with Nvidia on the synthetic data foundation model training project, we were actually generating a lot of these synthetic data and showing that, hey, you can actually, these synthetic data are actually as useful as real world data when it comes to multimodal pre-training.[00:02:16] But then, like I said, there's a lot of dollars being paid out to like external vendors or, or like. Other folks to manually curate these types of data. It was very clear to us that, okay, on our way to, let's call it embody general intelligence models need to learn the consequences behind their actions, which means that they need interactive data and the demand for those types of data are growing exponentially.[00:02:38] But everybody's sort of thinking about it from a pure, say, video generation perspective or something else. But we feel like the true actually opportunity is actually building reasoning models that can do these things, like how humans do these things today. So that's a little bit on the genesis of Moon Lake, and I think the reason I got into world models was partly.[00:02:59] A philosophical [00:03:00] take of the on the world where I like, believe the simulation theory and stuff like that. But on the other, on the other hand, it's really just like, oh, like there's an opportunity there that I feel like nobody's doing it the way I think should be done.[00:03:10] Structure, Not Scale: The Vision[00:03:10] Chris Manning: I can say a little bit about that.[00:03:12] Yeah. So of the overall goal is the pursuit of artificial intelligence and most of my career has been doing that in the language space and that's been just extremely productive. As we all know, the story of the last few years, I don't have to tell about how much we've achieved with large language models, but, uh.[00:03:31] Although they have been extremely effective for ramping language and general intelligence, it's clearly not the whole world. There's this multimodal world of vision, sound, taste that you'd like to be dealing with more than just, language. And then the question is how to do it. And despite, a huge investment in the computer vision space, right, as the research field computer [00:04:00] vision has been for decades, far, far larger than the language space, actually.[00:04:05] I think it's fair. Say that, vision, understanding sort of stalled out, right? You got to object recognition and then progress just wasn't being made right? If you look at any of these, vision language models, it's the language that's doing 90% of the work and the vision barely works. And so there's really an interesting research question as to why that is and at heart, the ideas behind Moon Lake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren't in the mainstream vision models, which are still trying to operate on the surface level of pixels.[00:04:50] swyx: I think one of your blog posts, you put it as structure, not scale. Is that, a general thesis?[00:04:57] Chris Manning: Yeah. Well, scale is good too.[00:04:58] swyx: Yeah. Scale is good. Too[00:04:59] lot,[00:04:59] Chris Manning: [00:05:00] lots of data is good as well and scale, but nevertheless, you want the structure Yeah. To be able to much more efficiently learn.[00:05:07] swyx: Yeah. The other thing I really liked also is you put out an example of what your kind of reasoning traces look like.[00:05:12] Right. Which you would distill is the word that comes to mind. I don't even think that's a good, good description, but it would involve, for example, geometry, physics, affordances, symbolic logic, perceptual mappings, and what, what have you. But like that, that is the kind of example that involves, let's call it spatial reasoning, role model reasoning as as compared to normal LM reasoning.[00:05:35] Yeah.[00:05:36] Defining World Models vs Video Generation[00:05:36] Vibhu: But also like taking it a step back. So how do you guys define world models? A lot of people see okay, you can do diffusion, you can do video generation. But, you guys put out quite a few blog posts. You put out a essay recently, we can even pull it up about efficient world models. You have a pretty like structural definition here, but for the general audience that don't super follow the space, right.[00:05:55] What's, what's the difference in what we see from like a video generation model to [00:06:00] a world gen A simulator? How do you kind of paint that last[00:06:02] Chris Manning: year? Yeah, so I think this is actually a little bit subtle because, people look at these amazing generative AI video models, SAWA VO three, one of these things, and they think Genie, they think, oh, this is amazing.[00:06:17] This is we've solved understanding the world because you can produce these generative AI videos, but. The reality is that although the visuals do look fantastic, those visuals actually are accompanied by an understanding of the 3D world, understanding how objects can move, what the consequences of different actions are, and that's what's really needed for spatial intelligence.[00:06:49] So I mean, a term we sometimes use is that you need action condition, world models. That you only actually have a world model if you can predict, [00:07:00] given some action is taken, what is going to change in the world because of it. And in particular, that becomes hard over longer time scales. So if you're simply, trying to.[00:07:12] Predict the next video frame. That's not so difficult. But what you actually want to do is understand the consequences, likely consequences of actions minutes into the future. And to do that, you actually much more of an abstracted semantic model of the world.[00:07:32] The Bitter Lesson & Data Abstraction[00:07:32] swyx: Yeah, the question comes where you want to have more structure than is available in just predicting the next token.[00:07:41] And typically, well, let's, let's call it the experience of the last five years has been that is just washed away by scale, right? So what is the right middle ground here that, you don't ignore the bitter lesson, but also you. Can be more efficient than what we're doing today.[00:07:57] Chris Manning: One possibility [00:08:00] is, look, if we just collect masses and masses and masses and masses of video data, this problem will be solved.[00:08:11] Under certain assumptions that could be true, but there are sort of multiple avenues in which it could not be true. The first is what's really essential is understanding the, the consequences of actions producing an action conditioned world model. And if you are simply, collecting observational video data, which is the easy stuff to collect, when you're sort of mining online videos, you don't actually.[00:08:41] Know the actions that are being taken to see how the video is changing. And so if you are never collecting directly actions and you are having to try and infer them from what happened in the observed video, that's not impossible. But it's very [00:09:00] hard and it's not really established that you can get that to work at any scale yet.[00:09:05] And so there's a lot of premium on collecting action condition video data, which is part of why there's been a lot of interest in using simulation so that you can be collecting data where you do know the actions, which isn't quite limited supply, but there's also in the limit of as much data as you could possibly have.[00:09:28] Maybe the problem is eventually solvable, but. Even though we collect huge amounts of text data is always at a great level of abstraction, right? Language is a human designed, abstracted representation where there's meaning in each token and it's representing and abstraction of the world, right?[00:09:51] As soon as you are describing someone as a professor, and as soon as you are saying that they're condescending, right? These are very [00:10:00] abstracted descriptions of the world. It's not at what you're observing as pixel level, and to get to that kind of degree of abstraction, starting from pixels is orders and magnitude of extra data and processing.[00:10:14] And so, although, we absolutely want to exploit, get as much data as possible, use the bitter lesson. Nevertheless, if there are ways in which you can work with five orders of magnitude less data than people working purely from pixels, you're gonna be able to make a lot more progress, a lot more quickly.[00:10:34] And that's the bet here. And so you could just say that's only wanting to be able to, do it more efficiently, do it more quickly, do it more cheaply. But I think it's actually more than that, I think. One should be making the analogy to how human beings work at one level. You know? Yes, we have these high [00:11:00] resolution eyes and we can look and see a scene like a video, but all of the evidence from neuroscience and psychology is that most of what comes into people's eyes is never processed.[00:11:13] Right. That you are doing fairly fine ated processing of exactly what you're focusing on. But as soon as it's away from that of yeah, there's another guy over there that you've sort of only processing top down this very abstracted semantic description of the world around you. And so, that's what human beings are doing.[00:11:33] They're working with semantic abstractions and so. I think it is just the right representation. ‘cause we also have other goals we want to be able to do, real time worlds. So that means there's a limit to how much processing you can do and we want to do long-term planning and consistency. And again, that favors abstraction.[00:11:55] I mean, I guess there was actually a recent. Blog posts that [00:12:00] came out from our Friends of physical intelligence and, they were sort of heading in the same direction they were saying Oh, to the pay[00:12:06] swyx: pay model.[00:12:07] Chris Manning: Yeah. Yeah. To maintain a long term memory of what's happening in the world. So we can, do longer term we actually storing text of what is, been happening in the world.[00:12:19] Right. It is not such a successful strategy of trying to keep it all at a pixel level.[00:12:24] Vibhu: And yeah, I mean, you can see it in video models like that Temporal consistency. We're at a scale of train on, all the video data we have. We have it for maybe 30 seconds, a few minutes. That's not the same as a game state played for half an hour.[00:12:37] Right. I thought you guys break it down pretty well. You have a, you have a blog post about. Building multimodal worlds with an agent. I dunno if you guys wanna talk about this. This is one of the things I read, I[00:12:48] swyx: thought, yeah, it's the thing I talked about with the reasoning chain. Yeah.[00:12:51] Vibhu: So there's like different phases to this.[00:12:53] It seems like it's more of an agent, a scaffold, very different approach than just, type in a prompt and you, you don't have the same consistency. [00:13:00] It also, like, for people that are listening, I, I would highly recommend reading it. It breaks down the problem in a different light, right?[00:13:06] So like, what do you need to consider when you're talking about video, like world game models, right? How would, what do you need to consider? What are the factors? What are the elements? What's the state? So I don't know if you guys have stuff to talk about for this one.[00:13:19] Fan-yun Sun: Yeah. Actually, I wanted to add on a little bit Yeah.[00:13:22] On our previous point, which is just like, change topics so quickly. I, I do feel like sometimes people confuse like, oh, like we're taking an an, an method with abstraction. That means they don't believe in bitter lesson. Like that's just false, right? Like we are believed is a bitter lesson. But then I feel like the question that we always discuss is like, what is the right abstraction level today?[00:13:42] The analogy I like to make is like, let's just say we can encode and decode. Represent all of images, videos, audio and bytes. Then the most bitter lesson approached is to train a next byte prediction model as opposed to the next token prediction model where it's just like, okay, it's natively multimodal, can just, but it's like, yeah, like [00:14:00] to, to Chris's point, it's like the scale and computing you need to achieve that.[00:14:03] So that's why we always come back to like, okay, what is the most efficient way to do it? And reasoning models to the point of this blog post is a showcase of like, Hey, we're actually just like reasoning about the world and reasoning about. The aspects of the world that CAGR that matter for me to learn what I want to learn from this role model.[00:14:21] swyx: Yeah, it's like you're improving the en encoder of whatever you're, trying to model. And like a better representation would just represent the important things in less space. Yeah. Which would just be more efficient.[00:14:33] Fan-yun Sun: Yeah.[00:14:34] swyx: So yeah, I, I, I fully agree that it is not, antagonistic to, bitter lesson.[00:14:38] I do wanna wanna mention one more thing. Is there any philosophical differences with the JPA stuff that, Yun is working on? I gotta go there. You, you, you, you're, you're imagining like some latent abstraction. I'm like, okay, fine. Let's, let's talk about it, right? Like it's an elephant in the room.[00:14:52] Chris Manning: Yeah.[00:14:53] JEPA & Philosophical Differences with LeCun[00:14:53] Chris Manning: There are philosophical differences. Jan Lacoon is a dear friend of mine, but. [00:15:00] He has never appreciated the power of language in particular, or symbolic representations in general. Yarn is a very visual thinker. He always wants to claim that he thinks visually and there are no words, symbols, or math in his head.[00:15:21] Maybe that's true of yarn. It's certainly not the way I think. Um. But at any rate, the world according to yarn is the basic stuff of the, the world and of intelligence is visual and language is just. This low bit rate communication mechanism between humans and it doesn't have much other utility and it's far inferior to the high bit rate video, that comes into your eyes.[00:15:53] And I think he's fundamentally missing a number of important things [00:16:00] there. Think of this evolutionary argument looking at animals, right? That the closest analogies, the things with chimps, right? So chimpanzees, have fairly similar brains to human beings. They have great vision systems, they have great memory systems.[00:16:18] They've got, better memory than we do of short term memories. They can plan, they can build primitive tools that, humans. Massively ahead in what we understand about the world, what we can plan, what we can build. And essentially what took off for us was that humans managed to develop language and that gave a symbolic knowledge, representation, and reasoning level, which just, okay if this sort of vaulting of what could be done with the intelligence in brains.[00:16:59] So the [00:17:00] philosopher Dan de refers to language as a cognitive tool and argues that, humans unique among the creatures in the world have managed to build their own cognitive tools and language is the famous first example. But other things like, mathematics and programming languages are also cognitive tools.[00:17:21] They give you an ability to. Think in abstractions, in extended causal reasoning chains. And that allows you to do much more. And we use that for spatial representation and intelligence and planning and gameplay as well. So we believe, and this is, underlying the specific technologies that Moon Lake is making, that symbolic representations are powerful.[00:17:50] And you want to use that in your understanding of the visual world when you want a causal understanding, when you want to maintain long-term [00:18:00] consistency and prediction. And as I understand it, that's just not in ya Koon's worldview. So I think that's the fundamental philosophical difference. Then there's the specific model.[00:18:11] He's been advancing jpa, that's a reasonable. Research bed is a direction as to, to head for building out a model of the visual world. To my mind, it's sort of one reasonable research bed. It's not really established. It's the best one that everyone should be following,[00:18:32] swyx: at least developed at scale, at Meta.[00:18:34] But it's not just vision, right? Like, I mean, JPA is a, just joint admitting prediction can be applied to anything really. And people have done it. The argument is that there is a latent representation or that is probably more. Suited to the task, then why not let machines do it for us instead of predefining it at all?[00:18:50] And isn't something like a JPA shaped thing the right answer? And if not, why not?[00:18:55] Chris Manning: So I think there's a part of jpa that's right, which is [00:19:00] you do want to have a joint. Embedding that gives you a consistent model of the world. And Jan's argument is you can never get that from auto aggressive language models ‘cause they're sort of left to right churning out one token at a time.[00:19:22] I guess this is where we're the research arguments of the field, I'm not actually convinced that's right. ‘cause although the token production is this auto aggressive, process that's heading, left to right, I guess don't have to be left to right. But anyway, in sequence of tokens we could have right to left Arabic.[00:19:40] But although that's true, all of the weights of the model that are internal to the transformer, they are a joint model of the model's understanding of the world. And so I think you can think of the weights of the model as a form of. Joint representation, [00:20:00] and therefore it is plausible to think that could be the basis of a world model, which avoids, ya's objections.[00:20:10] swyx: I think I follow, and obviously that would touch on what Moon Lake eventually ends up doing as well. Right. Like, which it's hard to tell because you put out the end results, but we don't know the inputs that go into it. So it's, it's, that's something that we have to figure out over time.[00:20:25] Vibhu: Yeah. I mean, I guess this kind of breaks down some of the outputs. Do you wanna walk us through it?[00:20:31] Reasoning Traces & Interactive Worlds[00:20:31] Fan-yun Sun: Yeah. So this, this really just walks us through the reasoning traces of like, okay. So that just say, if we wanna build a world in this context, it's really just a game demo that, that shows the, the variety of interactions that this world model can build.[00:20:45] And yeah, it's really just a reasoning traces of like, okay it prompted to create a bowling game. Like how did it achieve what you saw? That level of causality, interaction and consistency, right? So yeah, this is almost just like a, an example of [00:21:00] like a reasoning traces. Very[00:21:01] swyx: detailed.[00:21:01] Fan-yun Sun: Yeah.[00:21:01] Vibhu: Very, very detailed.[00:21:02] You gotta you don't even realize it, right? Like when a video is generated, what happens when a ball strikes a pin, right? So first, like you, there's audio in that, like audio triggers happens, score increments, the world changes. Like pins have to start dropping. There's a timer that goes on. It's just like very similar to how now we're used to reasoning for language models.[00:21:20] There's a whole state of what happens. So geometry, physics, all this stuff. And then yeah, there's kind of that single prompt. So asset, ation all this stuff. It's like a, it's a nice view to see what's going on.[00:21:32] swyx: I think Sun is also too polite to point out that, both like Google's genie, demos as well as world Labs is marble, do not have interactive worlds.[00:21:41] Fan-yun Sun: That's the benefit of having a reasoning model, right? Like, because you can, you can say, oh, like maybe in this particular context, I want to learn how to bowl. And then you can say, okay, then what is it important when it comes to learning how to bowl? Okay, maybe it's like I need to understand the, the basic of like, physics and I want to throw it over [00:22:00] them.[00:22:00] I wanna know that when I, when it resets it's a new game. So I know that yeah, basically, you know to pick up the ball, you know that ball's gonna cause the pins to fall down. You know that what's important to this particular bowling game is to score and you know that the score corresponds to the number of pins that fell down.[00:22:19] So it's just like, if it's a model that sort of knows what it. Looks like, knows what a bowling game looks like, but doesn't actually allows you to practice over and over again and to understand that, oh, like what it takes to actually get a high score. Then it sort of doesn't actually allow you to learn what you set out to learn within the world model.[00:22:38] And I think this is really just one example of showing like the advantages of the approach that we're taking over most the, let's call it the zeitgeist, is today, when people talk about clinical role models,[00:22:51] Chris Manning: right? So it sort of seems like the question to ask when there's a world model is.[00:22:58] Can I not [00:23:00] only just wander around the world and look at the beautiful graphics, can I interact with the objects in the world and see the right consequences of actions?[00:23:11] Vibhu: And you also understand what the consequences would be if you do something right. So it's not just like, okay, there's one thing if I pick it up, something will happen.[00:23:19] But, there's 50 options and I know I can expect, I can infer what would happen if I do any of them. Right. So very different when you can actually see it play around with it.[00:23:28] swyx: There,[00:23:28] Beyond Unity: Cognitive Tools for World Building[00:23:31] swyx: there's two cheeky elements of that. I mean, the, the, the I guess, less ambitious one is, let's really establish for listeners, why is this fundamentally different than writing Unity code, right?[00:23:40] Like just creating a model to translate a prompt into Unity code[00:23:44] Fan-yun Sun: so there is an underlying physics engine. Yeah. In that sense, there's some overlapping things to Unity, but the way we think about it is like physics engine. Tools or code are cognitive tools like borrowing Chris's term, right? Like tools [00:24:00] that the model can employ as means to an end.[00:24:04] So today maybe you say, okay, in this particular context we care about physics, we care about the long-term causality consequences. Then yes, we deploy it, employ physics engine, and then maybe tomorrow we say, okay, we're we're training that. Just say drones where we only care about really fluid dynamics and the visual aspect of the world.[00:24:25] Then, then yeah, maybe we don't actually, the model actually doesn't have to use a physics engine. Or maybe it employs other types of representation or physics engine to achieve the task. So yes, writing code for Unity is sort of similar to a tool that our A model can employ, but our goal is for a model to take a representation conditioned reasoning.[00:24:46] Approach or process.[00:24:47] swyx: Yeah,[00:24:47] Fan-yun Sun: internally.[00:24:48] swyx: Yeah. Using these things as just like general two calls. Right. Which I think is very interesting. The other more ambitious one is, some kind of recursive element where it becomes multiplayer, right? Like here, there's a single player element, you're not [00:25:00] modeling any other people involved.[00:25:01] And that is a whole other thing.[00:25:04] Fan-yun Sun: But in fact, we can really do multiplayers. Oh yeah, okay. I haven't seen any double situations. So just actually just like prompt our, our model to say, Hey, like configure to multiplayer. Then it'll do like this. You'll be able to configure multiplayer[00:25:16] swyx: great[00:25:17] Fan-yun Sun: persistency database for you.[00:25:18] Easy. Yeah.[00:25:19] Vibhu: So what, what are like some of the current limitations in where we're at? So there's one approach of like, okay, scale up video predictors. Obviously there's data issues. With approaches like this, is it data constraints? What are like the next steps? Is it real time? Like, so there's one side of, write an agent to write Unity code, but okay, I want to be streaming a game real time.[00:25:38] I want to have characters being also like agent, but where, where do we kinda see this scaling up? Right?[00:25:44] Fan-yun Sun: Yeah, there's definitely a data constraint. Like the more data, the, the better. This reasoning model can almost basically act as humans to like operate a variety of tools and softwares to build whatever's necessary.[00:25:57] And then there's a sort [00:26:00] of fidelity constraint, which we're actually solving with another model, which we can talk about later. But it's like, it's not as easy to get to photorealism with the approach that we're taking. But we think there are better solutions to that, which is we can dive into later.[00:26:14] Later.[00:26:15] Vibhu: The one one thing you note here is it's a diffusion model, right? So there's, there's a few approaches, diffusion caution, splatting, yeah, so Ry diffusion model, you guys wanna[00:26:25] Fan-yun Sun: Yeah.[00:26:25] Vibhu: Introduce,[00:26:26] Fan-yun Sun: yeah, totally.[00:26:26] Rie: Neural Rendering & Skins for Worlds[00:26:26] Fan-yun Sun: So within our world modeling framework, we think there are two models that we train, right?[00:26:31] Like, there's the multimodal reasoning model that we just talked about that essentially handles. Mainly the, the causality, the persistency and logic determinism of the world. And then RY is our bet on saying, okay, like while all those model, can take care of all these things that we just talked about, it's limitations compared to existing, say, video models, is that it doesn't have as high of a pixel [00:27:00] ality right off the gate, right?[00:27:02] And EE is to say, Hey, we can actually take whatever persistent representation that we generate with our multimodal reasoning model and learn to restyle it into photo photorealistic styles or arbitrary styles you want. So this model is almost to say, Hey, I'm going to respect the persistency and interactivity of the world that you created, but my only job is to make sure that its pixel distribution is close to what we want.[00:27:29] Vibhu: Yeah.[00:27:30] swyx: Great example right there. You kept the KL divergence.[00:27:33] Fan-yun Sun: Oh. Where,[00:27:34] swyx: no, no. I mean this, this is a, a classic like, how you don't stray too far from the source material as you, you kept the kl, which is Oh yeah. Kind of cool. Yeah.[00:27:43] Fan-yun Sun: Yeah.[00:27:44] swyx: I mean, and the[00:27:44] Chris Manning: difference is, and I mean sun was pointing at this, where sort of saying it's in one way a more difficult path, but a better path that, typically the diffusion models are producing the whole scene and it looks lovely, [00:28:00] but there isn't spatial understanding behind it, which is allowing for the real time graphics gameplay, the spatial intelligence, understanding the consequences of worlds where this is, taking a path where it is assuming an abstracted semantic model of the world's state.[00:28:20] And then the diffusion model is then being used on top of that to produce the high quality graphics.[00:28:27] swyx: Is there an intended practical, or business use for this, or is it like a, like a demonstration of capabilities?[00:28:34] Fan-yun Sun: We actually believe that this is gonna be the next paradigm of rendering. So it's gonna replace how ra raizer, it's gonna replace DLSS today because it not only has these pixel prior that's learned from the world such that you can literally play any game in photo realistic styles, which is a lot of people's desire when they do GTA, right?[00:28:51] Like,[00:28:51] Vibhu: all the mods, all the people adding perfect lighting and all this.[00:28:54] swyx: So[00:28:54] Fan-yun Sun: skins[00:28:55] swyx: for worlds, let's call it[00:28:56] Fan-yun Sun: skins, let's call it skin for worlds. I,[00:28:58] Vibhu: it's also like, you can call it skin, you can call it [00:29:00] customization. You can play it how you want, right?[00:29:01] Fan-yun Sun: Yeah, exactly. And I think another thing that we really pointed out specific specifically in this blog is the programmability of it, right?[00:29:09] So what this means is that this render historically render is always a derivative of the game state, right? You're saying, oh, here's the game state, I'm rendering out a frame. But here I'm saying actually this render can be part of the gameplay loop. I can say something along the lines of, if upon getting 10.[00:29:26] Apples, I'm gonna, my weapon of choice, my bullet's gonna turn into apples. And that's, that's possible because we can say, we can basically dynamically have certain game state trigger the, the preconditions to the render such that the rendering is now part of the game loop too. One thing is to just say, okay, it's, it's, it's the appearance.[00:29:47] But the second thing is also to say there's these novel interactions that are possible because this render now has actually priors of the world.[00:29:57] swyx: It is up to the artist to figure out what to do with it.[00:29:59] Fan-yun Sun: It [00:30:00] is up to the creators. Yes.[00:30:01] swyx: Yeah.[00:30:01] Fan-yun Sun: And I also think that's actually another big argument that we're making and the reason that we're picking, taking the bet we're baking is that a lot of the times, whether it's for embody AI gaming, like you want a layer where human can inject their intentions.[00:30:15] So, for example, let's just say in the context of gaming, it's obviously like my creative intent, but maybe in the context of embodied ai, it's like, oh, like I take this foundational policy and I want to actually fine tune it to deploy in my house. So you want to almost say, inject, have a layer where human can say, oh, here's the distribution of things I want to create to achieve my goal.[00:30:35] And I think 3D graphics as it as it is today, is basic, the layer for people to say, Hey, what do I care about in this world? And it allows, basically human intent to be expressed in these worlds much more explicitly and distributionally as opposed to just saying, Hey, I'm gonna generate like, arbitrary.[00:30:54] And it's like just prompts,[00:30:55] swyx: it's one of those things where like, I think you, you're going to build up a series of models, right? [00:31:00] This is just one of, this is probably like the highest utility or heaviest, frequency one, I don't dunno what to call this. Where like you Yeah. You can immediately drop this in on any game and you don't need anything else that.[00:31:10] That you guys do. But, I, I could see, I could see that I think the, the human intent is something that people are not even used to because we're so used to static worlds or, worlds that just don't react, or, I don't know. It's, it, you're kind of blowing my mind right now with like, I'm, I wonder if you've talked to people at GDC Hmm.[00:31:27] And what are they gonna do with it?[00:31:30] Fan-yun Sun: Yeah. Now the stance that we take on this front is like, we're not gonna be more creative than our users to ship[00:31:35] swyx: it out.[00:31:35] Fan-yun Sun: Yeah. But we wanna make sure that we're building things in a way that really allows them to express their intent.[00:31:41] swyx: The thing that you said about, here's the distribution that I want.[00:31:45] I think text may be too low of a bandwidth to. To really demonstrate, because I, I, there, I'm, I'm probably just gonna want to drop in a bunch of, reference assets and then you can figure it out from[00:31:58] Vibhu: there. But you probably wanna do a, a mixture of [00:32:00] both, right? Like you throw in a few images. I wanted this style.[00:32:02] Yeah. I want it to look like this. So it, it's, it's a mixture, right?[00:32:05] Chris Manning: I, I think it's a mixture. I mean, yeah, I mean there's clearly a visual component of this, and it's not that, everything can be text. ‘cause of course you want to give a visual look, but there's also a massive amount of giving the overall picture of the look of the world and the behavior of things that you can express in a few words of text.[00:32:32] And it be very time consuming and difficult to do via visual means. So I think, yeah, you want a combination of both.[00:32:40] Evaluating World Models[00:32:40] Vibhu: So one question I kind of have is, how do we go about evaluating world models? So like, there's many axes, right? One is like, okay. I have preferences. How well do we adhere to prompts? One is the simulation.[00:32:50] One is like do things, is there core logic that's broken? So coming from we know how to evaluate diffusion, there's fidelity, there's [00:33:00] stuff like that. But what are some of the challenges that most people probably aren't thinking about?[00:33:04] Fan-yun Sun: Yeah, I think this is like a great question and probably one of the hardest questions in role models because like, I think it always comes back to what are you building this role model for?[00:33:13] And depending on your end goal and purpose, the evaluation should defer. So in the context of games, then the most direct way of measuring is how much behind are people actually spending in this world that you create? And if your goal is to say, for example, in the context that we just talked about, like, hey, deploying, deploying action in body, a agent, then your, your end.[00:33:33] Metric is then, okay, after training in these worlds that you generate how robust it is to when you actually deploy to the target environment. But then, it's, it's hard to measure these end metrics. So today people have like these proxy metrics that I call that basically try to measure what we really care about, which is the end metrics, but then frankly it's different for every use case.[00:33:57] Yeah,[00:33:57] Vibhu: which seems like quite a challenge, right? Like in [00:34:00] in language models or video models. Image models, your benchmarks are proxies, right? People aren't actually asking instruction, following tool use questions. They're proxies of how well it will do downstream. But for this, so like, should teams, should companies have their own individual benchmarks outside of games?[00:34:16] If you think of stuff like, okay, video production, movies, stuff like that, that also want to use world models. Should, should they sort of internalize like. Their own proxy. Is this something you guys do? Where, where does that connect[00:34:28] Chris Manning: go? Yeah, I think this whole space is extremely difficult as things are emerging now.[00:34:35] And I mean, it's not only for world models, I think it's for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks and could you answer the question based on these documents and the various other kinds of, do pieces of logical reasoning or math.[00:34:58] But again, these are sort of. [00:35:00] And there were sort of visual equivalents of things like object recognition, right? For these small component tasks. These days so much of what people are wanting to do also with language models is nothing like that, right? You're wanting to, have an interaction with the language model and get some recommendations about which backpack would be best for you for your trip in Europe next month.[00:35:25] And it's not the same kind of thing, right? And it's not so easy to come up with a benchmark as to does this large language model give you an effective interaction for guiding you in a good way for shopping, right? So, and it's the same problem with these world models. So if we take the game design case, well success is that a game designer can.[00:35:57] Produce what they are [00:36:00] imagining in a reasonable amount of time. And that's really the kind of macro task. That's a very hard thing to turn into a benchmark and I think a lot of this is actually going to turn into people walking, walking with their feet. Right? I mean, I guess that's what's happening, at the large language model level, right?[00:36:23] When people are choosing to use, GPT five or Gemini or clawed, individuals are trying out these different models and deciding, oh, I like the kind of answers that GT five gives me, or no, I feel like I get more accurate detail from Claude, right?[00:36:43] Vibhu: It's a lot of[00:36:43] Chris Manning: vitech, a lot of people just using it.[00:36:45] It's vibe checking. I realize that, but it's actually whether. People feel it's giving them utility in what they want. Right.[00:36:52] Vibhu: And the the interesting thing there is like a lot of people prefer the visual, right? This looks pretty, which is not the objective of what this is [00:37:00] for, right? It's if a, if a game designer is working on something, they care about the game engine, right?[00:37:04] The state, it's, it can look whatever. You can fix that up later. Or you can have a really good game state and you can quickly edit it to 20. 20 different versions, like Keep State,[00:37:14] Chris Manning: right?[00:37:14] Vibhu: So[00:37:14] Chris Manning: that's a really important distinction, for and for speaking to Moon Lake strength, right? So, yeah, great visuals are lovely to look at for a few seconds, but gains are really all about the concept, the game play.[00:37:33] And a lot of the time that doesn't actually even require great visuals. I mean, there are just lots of very successful games which have relatively primitive visuals, and there are other games where people have spent millions producing photo realistic, visuals, and the game sucks, right? So, keeping those two axes apart is really important in thinking about what's important in a [00:38:00] world model for different uses.[00:38:02] swyx: This conversation is reminding me of some game review and fiction discussions I've, had in my sort of non-AI related life. Some, for some people might know Brandon Sanderson, who's a very famous, fiction author, had, is is a big game reviewer. And he, he's a big fan of video games where you change one thing about a normal what you might assume about, about the world.[00:38:22] For example, Baba is you, I don't know if you might have come across that, where like the rules change as you play the game. And also like where, you can do things like reverse time selectively or like change gravity selectively. And I think this is also reminds, reminds me of other kinds of world models that are created by authors.[00:38:38] Where Ted Chang is, is my typical example where he'll take the world that, you know today, but change one thing about it and, but then create a consistent world based on that. Which is long-winded answer of me to, of. For me to say is it's it easy to create alternative roles that don't exist, but you change one thing and then let's, let's run a whole bunch of people through it to see if it works.[00:38:58] Chris Manning: My first dance will [00:39:00] be, that seems a lot easier and more conceivable to do using Techn technology like Moon Lakes than with some of the other world models out there, where the sun can actually make it happen. I'll let him give a second answer.[00:39:15] swyx: If I guess for you, you're constrained by the game engine tool, right?[00:39:18] Like at the end of the day, that's the, that's the thought, partner that you have. If I ask for something where like, if it never is allowed to reverse time or if gravity only ever works one way, then well that's it. But sometimes gravity might change,[00:39:33] Fan-yun Sun: but it's a lot easier to change with code as opposed to a model that is learned primarily on data of.[00:39:42] Real world and virtual worlds that are, I guess, like for example, junior, like there's actually trained on a lot of real world data and a lot of virtual gaming data, and it's hard to say maybe it's easier to say, okay, I wanna change the visuals in like the time period of, of the world. Like, you can't change gravity, for [00:40:00] example.[00:40:00] Vibhu: I feel like you can to light bounds, right? Everything comes down to like, code is a better way to execute it, but the models aren't that diverse and creative, right? You can say, okay, make gravity slower. It can do that, but it's limited to your representation of how you text it out, right? Like they're, they're only gonna do a few iterations, whereas programmatically, if there's a game engine under the hood, you can kind of go wild, right?[00:40:22] So one of the, I dunno, one of the limitations of most models is that they're very overtrained to one style. Right. And extracting diversity is pretty difficult. At least that's something we've seen.[00:40:35] Fan-yun Sun: I mean, are there examples you have in mind where you Existing models? Yeah. Like it would be easier to do that's not using code.[00:40:43] Certain types of creative intent or like transition state transitions,[00:40:47] swyx: Clipping, other models, other wo models are very good at clipping through things. Clipping my, my, my legs clipping through a rock because it's, it's just, it's just bad. [00:41:00] Like, you would have to struggle very hard with your stuff to actually make that happen.[00:41:04] Which I think is maybe a topic that you actually prepared on, Gian Splatting versus, the other stuff.[00:41:09] Vibhu: Yeah. Yeah. It's just for those not super familiar, right? There's a, there's gian splatting, there is diffusion. Like what works, what scales up. I feel like in February when Soro one came out the blog post was literally titled like,[00:41:21] swyx: you bring it up.[00:41:22] You never know.[00:41:23] Vibhu: World, world, video generation models are world simulators. It's super bitter lesson pilled. Yeah, emer, a lot of it is emergence, right? So, not to go through their blog post, basically their whole thing was as you scale up all this consistency, all this stuff just kind of solves, it's a very simple premise, right?[00:41:41] They just scaled up, diffusion, and from there, this is, this is Feb 2024, how much can we, it's already been two years, which is basically five years. How much more in AI time do we need to just scale up or, or do we hit a data cap? But I think we already talked about this a lot, right? Like this is back to the beginning discussion of what's [00:42:00] appropriate for the time.[00:42:01] And that seems like your approach, right?[00:42:03] Fan-yun Sun: Yeah. The point I'm trying to make is that they're very many, many different types of world simulators and like having a world simulator that can produce pixel coherency is very, very useful for games and, marketing and all these things, but it's not as useful as people think when it comes to causal reasoning.[00:42:25] When it comes to embodied ai. Yeah, like it this title is true. We're not saying that it's, it's like, not a great world simulator, but actually in the blog that we, we, we, we wrote, the bet is more so that there are gonna be disproportionately large share of value of real world tasks or, and virtual tasks where high resolution pixel fidelity is not needed.[00:42:47] Yes. Video models have their values.[00:42:50] swyx: Yeah. This is at the absolute limit of my physics understanding, but one example that comes to mind is basically having to solve like ba the equivalent of a three [00:43:00] body problem in a deterministic Well, where the video models, which is approximated good enough. Yeah.[00:43:08] Right. Like there's, there's some point at which your approach kind of runs into like the you now have to simulate the world. Please, thank you very much. And like you're trying to do that, but only to the extent that the game engine lets you and like game engines cannot do some things.[00:43:23] Fan-yun Sun: Yeah, no, I mean, I think the interesting or more technical question here actually is where do you draw the boundary between.[00:43:32] What's handled with, let's say, diffusion prior and what, when? What's handled with symbolic priors?[00:43:38] swyx: Yes.[00:43:38] Fan-yun Sun: Okay.[00:43:38] swyx: Okay.[00:43:39] Fan-yun Sun: Right. Let's go there. Because this, this boundary can actually be fluid. Like I think like maybe what you're trying to get at is like, okay, people are saying pixel prior, everything. But what we're saying is, okay, there's a boundary that we draw where this is where we think provides the most economical value for the domains and things that we care about today.[00:43:59] [00:44:00] And I actually do think, and it's something that we do internally all the time, which is like, okay, given new equations that we learn or new elements of the world and that we, we learn, or maybe some other knowledge that we acquire in the process of developing the models. Should we still be maintaining this line exactly as it is today?[00:44:22] Or should we move it a little bit left or a little bit right? Right. Like sometimes that we realize that, oh, like maybe customers or, or folks like want certain things that are better handled with preop pryor as opposed to, symbolic prior than,[00:44:34] swyx: yeah. Your, your skin thing is a, is a example moving it, right.[00:44:37] Yeah.[00:44:37] Or left. Yeah,[00:44:37] Fan-yun Sun: exactly.[00:44:38] swyx: I dunno what the, the left right is.[00:44:39] Fan-yun Sun: Yeah, yeah, yeah. No the, the model.[00:44:42] swyx: Yes.[00:44:42] Fan-yun Sun: Actually we have a few iterations of them. They're actually at slightly different[00:44:45] swyx: I know boundaries. You should, you should do that. That's a cool dimension to show.[00:44:49] Fan-yun Sun: Yeah.[00:44:50] swyx: Is quantum mechanics the diffusion prior of our world?[00:44:55] Right. It's like that's the boundary of classical mechanics versus quantum. Right? Like, that's it. At one [00:45:00] point God plays dice and the other point doesn't.[00:45:02] Fan-yun Sun: I dunno if Chris, you wanna say it, but I think, I think generally I feel like physics is better with symbol P priors.[00:45:08] Chris Manning: Even quantum physics.[00:45:09] Fan-yun Sun: Even quantum physics.[00:45:11] swyx: Yeah. This is starts against to, MLST territory is, is what I call it, where, he, he likes to get philosophical. We, we we're quite friendly.[00:45:18] Vibhu: I mean, we need to get, we need to get singularity. I heard some of that.[00:45:23] swyx: No, no, I think that is actually really helpful and man, I just want you to productize this like, as a product guy, I'm just like, oh, also[00:45:32] Vibhu: a gamer, I[00:45:33] swyx: wanna, it's like a researcher, like, it's cool.[00:45:35] Like this is a, the theoretical, like you have a very good, I don't know, like the way of thinking about these things, but I just wanna see you like, express it. I do think like your fundamentally things when, when you leave open new tools, like, okay, use, use human intent to incorporate it into how you render.[00:45:52] Artists are gonna have to take like two to three years to figure out what to do with this. And you just don't know.[00:45:57] Chris Manning: Right. But I think, this is, [00:46:00] gives a much more approachable and controllable world for the society, which is the beauty, the beauty of, NLP, that that will enable it to be adopted and used.[00:46:10] And we are very hopeful about that. Yeah,[00:46:13] Fan-yun Sun: yeah. Yeah. I mean, we are, we are very focused actually on commercialization in the sense that like we do, we do really believe in the data flywheel app approach. Yeah. Where, we put this in the hands of the creators and the users and then they will teach us when, what capability our model should improve.[00:46:27] And that's why we are, we are actually, like products and beta[00:46:31] swyx: Yeah. Focusing on gaming. What, what's like the adjacent thing to gaming[00:46:34] Fan-yun Sun: embody adjacent, basically. So maybe we can, we can I'll maybe start with where we see the platform in three years. Yeah. Which is like, okay. The users would tell us what they want to achieve.[00:46:45] The end goal could be, Hey, I just, I wanna make something to teach my kids the value of humility. Or it could be, Hey, I wanna fine tune my, drones to be really good at rescue situations. I could be vacuum robots. I want to like train [00:47:00] my manipulation or like vacuum robot to be very robust to my office, right?[00:47:04] But it's like, whatever it is, scenario robust to[00:47:06] swyx: my office[00:47:07] Fan-yun Sun: or like navigate very robustly in my office. But then it's like, whatever end goal that you want, our role model will say, okay, given what you want to achieve, let me generate a distribution of environments such that I can train and evaluate whatever it is you want.[00:47:24] Yeah. Right. Maybe for the purpose of games, it's just the end simulation and that's the end product for certain policies. It's like I can train it within these environments and then help you see where your policy is failing or not. Yeah. And then, so I think,[00:47:37] swyx: so in that case, much more of a training tool.[00:47:40] Than in other training[00:47:41] Vibhu: evaluation? Both. Right?[00:47:43] swyx: Sure. Same. Same thing.[00:47:43] Fan-yun Sun: Yeah, same thing. I think it's just this role model that allows people to train any policy that can act in any multimodal environments.[00:47:51] swyx: Would it be harder to reward hack? Is there an angle here where it is harder to reward hack? Like it's just, I'll just put it generally because I think that's a, that's obviously a key [00:48:00] problem that a lot of people face when in training agents in these environments, and I don't know, can you solve it?[00:48:07] Chris Manning: I think not necessarily. To the extent that there's a mis specified reward that. It seems like it could be hacked in a more symbolic world or in a more pixel based world. I dunno if Sun's got any thoughts, but I don't think that's really being solved.[00:48:26] swyx: The other thing that comes to mind is just you could just build a better sawa as a video generator model, right?[00:48:31] Because then you, you would move the diffusion, side a bit more further to the right. I think if I got the directionality correct. And that's it.[00:48:40] Vibhu: It's better on domains, right? Like on consistency over now, or for sure it exists versus something doesn't, right.[00:48:46] Chris Manning: So[00:48:46] swyx: yeah. Yeah. Is[00:48:49] Vibhu: is a question more like, like[00:48:51] swyx: I'm just riffing on like, how do you, what can you build, you know?[00:48:54] Oh, with the stuff that you have. I do think that the minor, the academic does go immediately to training [00:49:00] and in eval evaluation, but like art tends to take unusual directions. Like you might end up,[00:49:06] Chris Manning: okay. Yeah. But the question is, can you use this piece of software to develop compelling gameplay and. I don't think you can take SOAR and produce compelling gameplay, right?[00:49:19] If you want to have a world that you can wander around in a bit, you are good. But what are your abilities to have gameplay mechanics implemented the way you'd like them to be and to have things stay, with the long-term history of your gameplay that influences future actions. I think there's just nothing there for that.[00:49:39] swyx: Yeah, I do tend to agree. I, I'm just trying to sort of test the boundaries. I would also make the observation that as AAA games industry has developed the line between what is a movie and what is a game has blurred. And you, you, you do end up basically producing a two hour movie as part of your game.[00:49:57] Fan-yun Sun: No, honestly, there, there's so many actually [00:50:00] applications in adjacent markets that our world model can go into. Yeah. But yeah, it, it's sort of fun to riff, riff on. Although on the execution side, we we, we need to stay focused with like, okay, what are the capabilities we want to unlock over time?[00:50:11] And there's a roadmap for that. But yeah, if we're just riffing on sort of like the possibilities, I feel like, whether it's endless Yeah, it's like classic[00:50:18] swyx: and the embedding for a possibility and endless in my mind, it's very close. Yeah. I do wanna, focus on one, like weird choice. I, I don't know if it's weird.[00:50:28] Maybe I'm, I got something here. Audio, right? You could have just said no audio And audio in my mind has a lot of recursion, whereas in video you can just do recasting and that's much computationally much simpler. Audio just seems way harder. I don't know if you wanna just comment on just the special 3D audio.[00:50:46] Problem. Did you really have to do it? I guess you do to be immersive, but like a lot of people do treat it as like, well, you just stick a, a tt S model on top of[00:50:57] Vibhu: Well, there's a lot more to game audio than [00:51:00] just speech. Right. It's not just[00:51:01] swyx: tts. Yeah. Tts. S Fxt, GM Spatial in my mind Echoes[00:51:06] Chris Manning: Yeah.[00:51:06] swyx: And reflections.[00:51:07] And I, I don't even know what's, what else? I don't know what, what other problems in this space.[00:51:13] Fan-yun Sun: Yeah, I think this point like the, it's sort of a more, more pointing to the benefits of using an game engine as a tool that's available to the model, right? Because like part of the spatial audio is from the code that is underlying the simulation.[00:51:32] And while we do give our model access to other types of audio models as. Tools.[00:51:39] swyx: None of them would be spatial, I think.[00:51:41] Fan-yun Sun: But that's exactly sort of more 0.2. We're giving our model an abstraction or a suite of tools such that it's able to achieve that. And you can argue that sort of spatial is like a, like a emergence out of the, the tools that we and abstraction that we provide to the agents.[00:51:59] And I think that's the beauty of [00:52:00] this, this, this approach is like there's a lot of things kind of like how human's built technology and they're like Lego blocks that build on top of each other. And it's the same thing here. There's gonna be things that sort of just sort of emerges from being able to put these things together in like combinatorially interesting ways,[00:52:14] Chris Manning: right?[00:52:15] So this integrated audio model exploits the understanding and semantics of the Moon Lake world, right? And whereas in general for the Gen AI video models. There's no actual integration across to audio at all, right? That someone might stick some music or stick a soundscape or whatever else on top of their video.[00:52:44] So it's not a silent video, but they're in no way connected into a consistent world model. And there's nothing that's okay. An action is happening in the video. Therefore there should be a sound that's [00:53:00] coming from this part of the visual field.[00:53:03] swyx: Yeah.[00:53:03] Vibhu: Is that different than Sora too? Does it not have audio?[00:53:06] Not to say it's not like[00:53:08] swyx: amazing[00:53:08] Vibhu: isn't a spatial[00:53:09] swyx: audio.[00:53:09] Vibhu: It doesn't,[00:53:10] swyx: no. I've played around it with it enough. It just sounds like someone put an 11 laps voice on top of it and just tried to do the lip sync.[00:53:18] Vibhu: Oh, yeah. I've seen, okay. Generate a dog at the beach and reactions to big wave and move[00:53:23] swyx: around.[00:53:23] It's definitely like, so have the dog, have the dog move away from camera and see if the, the song goes down. It doesn't. ‘Cause they don't have facial audio.[00:53:32] Fan-yun Sun: We do want to basically like we, our moral model, like the one we're training is basically towards the goal of having a combined latent representation across all these different modalities.[00:53:42] Right? Such that it can like reason across these different modalities. So for example, if I close my eyes and like you play a video, you play a sound of like a car skidding away from me. I almost can like, visually extrapolate that trajectory in my mind. And I think that type of capability, we want our model to be able to reason, right?[00:53:59] And that's the reason that [00:54:00] we're sort of taking this multimodal reasoning approach. It's like we want this combine late in space that can[00:54:05] swyx: Yeah. Oh, you said late in space. We like that. Here we have to play the, the bell Every time that someone says late in space, no, you gotta train daredevil one. Where you, you, you, it's only audio, but you have to work out.[00:54:15] Where everything is.[00:54:19] Cool. I I think that that was, that was about it for our Moon Lake coverage. I do think that we have like a couple of, Chris Madden questions on, on IR and, just any, any other sort of attention topics or n NLP topics.[00:54:31] Vibhu: Okay.[00:54:31] swyx: Go ahead.[00:54:32] Chris Manning's Journey: From NLP to World Models[00:54:32] Vibhu: Well, no, I mean, yeah, it's just fun. We talked a bit about how you guys met, but you basically, you, you were like the godfather of NLP per se, right?[00:54:39] You spent the whole career from early embeddings, early early attention. You did 2015 attention for machine translation, everything. You, you had information retrieval, so RAG before rag, we just wanna shout that out and admire a lot of that. Right? So what prompted the switch over to world models?[00:54:56] How, how'd all that come about?[00:54:58] Chris Manning: To some answer it [00:55:00] is, the enthusiasms and creativity of students, but there's a bit of a history there, right? So, yeah. So clearly most of my career has been doing stuff with language and how I got into research was thinking, ah, this is just so amazing how humans can produce speech and understand each other in real time.[00:55:21] And somehow they managed to learn languages from their kids. How could this possibly happen? And so, yeah, starting off I was very focused on language, but as it sort of got into the 2000 and tens, I started, going, I'd been working on question answering, and then I started to get, interest in visual question answering.[00:55:42] And that was an area where it was very noticeable. That the visual understanding was bad. Right. These were the days when like, it sort of seemed like there's almost no visual [00:56:00] understanding. You were just getting answers that came from priors. So, if you asked how many people are sitting at the table, it'd always answer two regardless of how many, how many people you could see in the picture.[00:56:11] And so it seemed like, oh, these models actually aren't able to get semantic information outta

The Sifted Podcast
Tiny VC partner Philipp Moehring on when to take money off the table

The Sifted Podcast

Play Episode Listen Later Apr 2, 2026 45:07


European seed rounds are ballooning. Last month Yann LeCun's AMI Labs picked up $1bn at a $3bn valuation while David Silver's Ineffable Intelligence was reported to be raising $1bn at a $4bn valuation. “Thankfully, the billion dollar seed round is not the standard across Europe — yet,” Philipp Moehring tells host Amy Lewin on this episode of the Sifted podcast. “That would be concerning.”Philipp started microfund Tiny VC with Andy Chung almost a decade ago to invest in the hottest companies in Europe before anyone else. Its portfolio of 450+ startups includes self-driving car company Wayve, AI-powered video creator Synthesia, legal tech Lawhive and workflow automation platform N8n. But unlike many VCs, Tiny doesn't join boards, or lead rounds and doesn't really ‘do' media. It last raised a third £53m fund in 2023, and was crowned 20VC's ‘top European microfund' earlier this year.This week on the podcast, Philipp and Amy discuss:How VC will change over the next 10 yearsWhen Tiny takes money off the table Why young people make awesome foundersAnxiety-inducing LinkedIn postsAnd, for better or worse, the return of the tech bro

Hidden Forces
The God Machine: Demis Hassabis and the Quest for Superintelligence | Sebastian Mallaby

Hidden Forces

Play Episode Listen Later Mar 30, 2026 56:17


In Episode 472 of Hidden Forces, Demetri Kofinas speaks with Sebastian Mallaby about Demis Hassabis, the co-founder of DeepMind and the man widely regarded as the most consequential figure in the development of artificial general intelligence, and what his story reveals about the science, the competition, and the existential stakes of the AI transition now underway. The first hour traces Hassabis's early life as a chess prodigy in North London, his studies in computer science at Cambridge and neuroscience at University College London, and the founding of DeepMind in 2010 alongside Shane Legg and Mustafa Suleyman. Mallaby and Kofinas explore the philosophical and scientific foundations of Hassabis' approach — including the decisive shift from symbolic, rule-based AI development to the inductive, data-driven logic of deep learning — as well as the competitive dynamics that have shaped the industry: Google's acquisition of DeepMind in 2014, Hassabis's early skepticism of language models and the transformer architecture, and the moment ChatGPT's release shattered what hopes remained of a "singleton" scenario in which a single, safety-minded lab could develop AGI on behalf of all humanity. The second hour picks up with the launch of ChatGPT 3.5 in November 2022 and what it revealed about the state of the AI race — including Mallaby's assessment of Sam Altman and the character of the individuals now driving this technology forward. They examine whether personality and values matter when competitive and commercial pressures are this overwhelming, and revisit a conversation Mallaby had with Geoffrey Hinton in which the so-called "godfather of AI" offered his honest assessment of humanity's odds of surviving the AI transition. The episode closes with an exploration of why the safety and existential risk conversation has receded from public discourse — not because the concerns have been resolved, but because geopolitical and commercial imperatives have made it nearly impossible to slow down — and considers the range of perspectives on that risk, from Yann LeCun's dismissiveness of existential threats to the technical alignment work being pursued inside the major labs themselves. Subscribe to our premium content—including our premium feed, episode transcripts, and Intelligence Reports—by visiting HiddenForces.io/subscribe. If you'd like to join the conversation and become a member of the Hidden Forces Genius community—with benefits like Q&A calls with guests, exclusive research and analysis, in-person events, and dinners—you can also sign up on our subscriber page at HiddenForces.io/subscribe. If you enjoyed today's episode of Hidden Forces, please support the show by: Subscribing on Apple Podcasts, YouTube, Spotify, Stitcher, SoundCloud, CastBox, or via our RSS Feed Writing us a review on Apple Podcasts & Spotify Join our mailing list at https://hiddenforces.io/newsletter/ Producer & Host: Demetri Kofinas Editor & Engineer: Stylianos Nicolaou Subscribe and support the podcast at https://hiddenforces.io. Join the conversation on Facebook, Instagram, and Twitter at @hiddenforcespod Follow Demetri on Twitter at @Kofinas Episode Recorded on 03/23/2026

Monde Numérique - Jérôme Colombain
☕️ GRAND DEBRIEF (mars 26) - Agents IA, pari français, MacBook Neo, Sony en panne

Monde Numérique - Jérôme Colombain

Play Episode Listen Later Mar 29, 2026 60:42


Les agents IA franchissent un nouveau cap. Anthropic joue les chevaliers blancs. Yann Le Cun fait cavalier seul avec les world models. Le smartphone fête ses 25 ans. Apple bouscule le marché avec son MacBook Neo. Sony en panne d'innovation lâche l'auto électrique. Avec Bruno Guglielminetti (Mon Carnet) et François Sorel (Tech & Co)Avec Free Pro, le meilleur de Free pour les entreprisesLes agents IA sortent du laboNous revenons sur l'explosion des agents IA capables d'agir directement sur un ordinateur. Derrière l'effet spectaculaire, on souligne aussi les dangers de ces outils encore jeunes, notamment lorsqu'ils accèdent à des machines personnelles ou à des données sensibles (Monde Numérique a déjà abordé ce sujet à plusieurs reprises, notamment à propos de la révolution des agents IA en liberté, de la folie des agents IA et de la définition même d'agent IA).Anthropic, chevalier blanc de l'IA éthique ?Nous nous interrogeons sur la posture d'Anthropic et de son patron Dario Amodei face aux usages militaires de l'intelligence artificielle. Sincérité éthique ou stratégie d'image ? Yann LeCun et le pari des world modelsOn revient aussi sur l'offensive de Yann LeCun avec AMI Labs, sa nouvelle structure consacrée aux world models. Son ambition est claire : dépasser les limites des grands modèles de langage en développant une IA capable de comprendre le monde physique, de raisonner et, à terme, de gagner en autonomie (à écouter aussi sur Monde Numérique : L'HEBDO du 14 mars et L'Actu Tech du 14/03).25 ans de smartphone : retour sur une révolutionÀ l'occasion du MWC Barcelona, nous revenons sur l'émission spéciale de François Sorel consacrée aux 25 ans du smartphone. C'est l'occasion de replonger dans la préhistoire du mobile intelligent, des premiers terminaux hybrides aux affrontements entre constructeurs, opérateurs et, plus tard, Apple. Au fil des souvenirs et des anecdotes, on voit comment une technologie encore floue au début des années 2000 s'est imposée comme le centre de notre vie numérique. On en profite aussi pour raconter les coulisses d'une révolution industrielle et culturelle qui continue de structurer toute la tech actuelle.MacBook Neo : Apple attaque l'entrée de gammeLe MacBook Neo, nouvelle offensive d'Apple sur le marché des ordinateurs portables abordables, démarre en trombe. Un produit à la fois séduisant, bien fini et stratégiquement redoutable. Pourrai-t-il attirer étudiants et nouveaux venus dans l'écosystème de la marque et propulser Apple sur un nouveau segment de marché, jadis réservé au monde PC ? Au-delà du prix, on analyse la logique d'Apple : proposer du matériel plus accessible pour mieux installer ensuite ses services et fidéliser sur le long terme. Sony : la fin d'un géant de l'électronique grand public ?Où va Sony ? Entre le recul dans les téléviseurs et l'abandon du programme automobile Afeela mené avec Honda, on a le sentiment que la marque japonaise perd encore un peu de son aura dans l'électronique grand public. Au-delà du cas Sony, c'est la fragilité plus large de l'industrie japonaise face aux géants chinois qui est en jeu.Hébergé par Audiomeans. Visitez audiomeans.fr/politique-de-confidentialite pour plus d'informations.

Choses à Savoir SCIENCES
Pourquoi Yann LeCun mise-t-il sur les “world models” plutôt que sur les LLM ?

Choses à Savoir SCIENCES

Play Episode Listen Later Mar 24, 2026 2:35


Depuis quelques années, l'intelligence artificielle est dominée par les LLM, les “Large Language Models”, comme ChatGPT ou Gemini. Ces modèles sont entraînés sur des quantités gigantesques de textes afin d'apprendre à prédire le mot suivant dans une phrase. Autrement dit, ils sont extrêmement performants pour manipuler le langage. Mais pour certains chercheurs, dont Yann LeCun, cette approche possède une limite fondamentale : ces systèmes apprennent surtout un modèle du langage, pas un modèle du monde réel. Un LLM peut donc produire des phrases plausibles, répondre à des questions ou écrire un essai. Mais il ne comprend pas réellement la réalité physique qui se cache derrière ces mots. Par exemple, il peut expliquer comment préparer un café, mais il ne sait pas vraiment comment manipuler les objets dans une cuisine ni prévoir ce qui se passerait si un robot exécutait ces actions. C'est précisément là qu'intervient l'idée des world models. Un world model est un système d'intelligence artificielle qui apprend à construire une représentation interne du monde : les objets, l'espace, le temps et les relations physiques entre les choses. Ces modèles sont entraînés non seulement sur du texte, mais aussi sur des images, des vidéos et des interactions avec l'environnement. Leur objectif est de comprendre comment le monde fonctionne, par exemple la gravité, les collisions ou le déplacement d'objets. L'une des capacités clés d'un world model est la simulation mentale. Le système peut imaginer différents futurs possibles : “si je fais cette action, que va-t-il se passer ensuite ?”. Cette capacité de prédiction permet alors la planification et la prise de décision, ce qui est essentiel pour des robots, des voitures autonomes ou des agents intelligents capables d'agir dans le monde réel. Yann LeCun estime que l'intelligence humaine fonctionne justement de cette manière. Notre cerveau possède une sorte de modèle interne du monde qui nous permet d'anticiper les conséquences de nos actions. Pour lui, une véritable intelligence artificielle devra donc posséder plusieurs capacités absentes des LLM actuels : une mémoire persistante, du raisonnement, de la planification et une compréhension du monde physique. C'est pour explorer cette voie qu'il a récemment lancé une nouvelle startup dédiée à ces technologies. L'objectif est de créer des systèmes capables d'interagir avec la réalité — par exemple dans la robotique, l'industrie ou la médecine — plutôt que de simplement générer du texte. En résumé, les LLM sont des modèles du langage, tandis que les world models cherchent à être des modèles du monde. Et pour Yann LeCun, c'est peut-être cette différence qui déterminera la prochaine grande révolution de l'intelligence artificielle. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

AI Tool Report Live
Apple Ditches OpenAI + $1.2B Robot Week | AI News in 5

AI Tool Report Live

Play Episode Listen Later Mar 24, 2026 5:11


Apple confirms Google Gemini will power the new Siri — dropping OpenAI as the default on a billion devices. GPT-5.4 Thinking becomes the first AI to beat the human baseline at navigating a computer. Yann LeCun raises $1.03B in the largest seed round in European history to build a new kind of AI. Robotics companies pull in $1.2B in a single week. And the U.S. Department of Energy puts $293M behind AI for national challenges.Partner LinksBook Enterprise AI Training — https://www.upscaile.com/Subscribe to our free newsletter — https://www.theaireport.ai/subscribeGet our free AI resource pack — https://community.theaireport.ai/checkout/the-ai-report-welcome-gift?coupon_code=WRTHConnect with Liam — https://www.linkedin.com/in/not-the-f1-driver-liam-lawson/About The AI Why:The AI Why breaks down what's actually happening in AI — who's building it, how it's being deployed, and why the people building it do what they do. New episodes every Tuesday (AI News in 5) and Thursday (founder and exec interviews).

French Podcast
News In Slow French #786- Intermediate French Weekly Program

French Podcast

Play Episode Listen Later Mar 20, 2026 10:01


Comme toujours, nous commencerons notre émission par une discussion sur l'actualité. Le président Trump voudrait que l'Europe envoie des navires de guerre dans le détroit d'Ormuz, mais l'Europe a rejeté ses demandes. Le ministre allemand de la Défense, Boris Pistorius, a déclaré lundi : « Ce n'est pas notre guerre ; nous ne l'avons pas déclenchée ». Nous discuterons des différentes positions des pays européens et nous verrons si le refus d'accéder aux demandes de Trump pourrait compromettre l'avenir de l'OTAN. Notre deuxième sujet de discussion portera sur une loi adoptée dans un État américain interdisant aux écoles d'enseigner des informations erronées concernant l'émeute du Capitole du 6 janvier 2021. Elle oblige les enseignants à présenter ces évènements comme une « attaque violente sans précédent » contre les institutions démocratiques américaines, qui visait à renverser le résultat de l'élection présidentielle de 2020. Dans notre section scientifique, nous parlerons d'une étude selon laquelle le simple fait de penser à des boissons alcoolisées peut influencer l'humeur. Et nous conclurons la première partie de l'émission en commentant la cérémonie des Oscars de dimanche dernier.   Le reste de l'émission d'aujourd'hui sera consacré à la langue et à la culture françaises. Notre point de grammaire de la semaine sera : Prepositional Phrases près de, quant à, quitte à, vis-à-vis de, à travers, and à propos de. Nous nous intéresserons à un très grand chercheur français, Yann Le Cun, qui a réussi à lever près d'un milliard d'euros pour son entreprise d'IA. Nous verrons en quoi son approche est révolutionnaire. Nous terminerons avec l'expression de la semaine : Mettre de l'eau dans son vin. Nous discuterons du nouveau palmarès du guide Michelin pour l'année 2026. 62 nouvelles étoiles ont été attribuées à des restaurants de France et de Monaco. - « Ce n'est pas notre guerre ; nous ne l'avons pas déclenchée » - La Virginie interdit aux écoles d'enseigner des informations erronées sur l'émeute du 6 janvier 2021 - Selon une étude, le simple fait d'imaginer différents alcools suffirait à modifier notre humeur - La cérémonie des Oscars 2026 renoue avec son prestige et son enthousiasme d'autrefois - IA : Yann Le Cun lève 900 millions d'euros pour sa start-up basée en France - Le Guide Michelin publie son palmarès 2026

Choses à Savoir ÉCONOMIE
Que sont les “world models” ?

Choses à Savoir ÉCONOMIE

Play Episode Listen Later Mar 20, 2026 2:44


Le monde de l'intelligence artificielle vient de connaître un séisme financier et technologique. Yann LeCun, l'un des « parrains » français du deep learning et lauréat du prix Turing, a officialisé le lancement de sa start-up, Advanced Machine Intelligence (AMI Labs), avec une levée de fonds record de 1,03 milliard de dollars. Ce tour de table, l'un des plus importants jamais réalisés en phase d'amorçage en Europe, propulse immédiatement la jeune pousse parisienne au rang de licorne.Rupture avec les modèles de langage (LLM)Cette annonce marque un tournant philosophique majeur. Jusqu'à présent, le secteur était dominé par les grands modèles de langage (LLM) comme ChatGPT. Cependant, Yann LeCun ne cache plus ses divergences avec l'approche actuelle, qu'il juge limitée. Selon lui, les LLM ne font que prédire le mot suivant sans véritable compréhension du réel. Ils sont incapables de raisonner, de planifier ou d'appréhender les lois physiques élémentaires.Pour dépasser ces limites, AMI Labs mise sur les « World Models » (modèles de monde). L'idée est de créer une IA capable d'apprendre de manière autonome en observant le monde, à l'instar d'un enfant qui comprend la gravité en voyant un objet tomber. Ces systèmes s'appuient sur l'architecture JEPA (Joint Embedding Predictive Architecture) pour modéliser les interactions physiques et logiques, permettant ainsi à l'IA d'anticiper les conséquences d'une action dans un environnement complexe et multidimensionnel.Un soutien massif et stratégiqueLe prestige du fondateur a attiré un casting d'investisseurs exceptionnel. Le tour de table a été co-dirigé par des fonds comme Bezos Expeditions (Jeff Bezos) et Cathay Innovation, avec la participation de géants industriels tels que Nvidia, Samsung et Toyota. La France est également en première ligne avec le soutien de Bpifrance et de grandes fortunes comme Xavier Niel (Iliad), la famille Mulliez ou le groupe Dassault.L'avenir de l'IA ancré dans le réelL'objectif à court terme n'est pas de sortir un produit de consommation immédiat, mais de bâtir une infrastructure scientifique solide. Les fonds serviront principalement à acquérir une puissance de calcul colossale (GPU) et à recruter les meilleurs chercheurs mondiaux à Paris, New York et Montréal.À terme, ces « World Models » pourraient révolutionner la robotique domestique, l'industrie automobile et l'automatisation complexe. En apprenant à comprendre le monde physique plutôt que de simplement manipuler le langage, AMI Labs ambitionne de donner naissance à une intelligence artificielle véritablement autonome et dotée de « bon sens ». Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

Silicon Carne, un peu de picante dans la Tech
AMI Labs, peut-on bâtir un champion mondial de l'IA depuis Paris ?

Silicon Carne, un peu de picante dans la Tech

Play Episode Listen Later Mar 18, 2026 97:58


Dans cet épisode :- AMI Labs lève plus d'un milliard à Paris, porté par un syndicat d'investisseurs venu de trois continents. Mais le deal repose-t-il sur Paris comme écosystème — ou uniquement sur la réputation de Yann LeCun ?- L'IA promise comme utility vendue au compteur, l'emploi rendu optionnel, les diplômés mis sur le carreau : les promesses et les menaces s'accumulent sans qu'aucune vision politique ne les arbitre. Qui fabrique la peur de l'IA et à qui profite-t-elle vraiment ?- Et puis on finit avec les élections municipales et on se pose la question de savoir si les plateformes comme X sont devenues des armes politiques ordinaires.===========================

Health and Explainable AI Podcast
Martin Raison CTO of Nabla on Architecting the Agentic AI Era in Healthcare

Health and Explainable AI Podcast

Play Episode Listen Later Mar 18, 2026 37:38


Martin Raison, Co-founder and CTO of Nabla speaks with Pitt HexAI host Jordan Gass-Pooré about Nabla's central role in architecting the agentic AI era in healthcare. Martin details Nabla's evolution from a specialized ambient scribing tool into a comprehensive "Adaptive Agentic Platform". They discuss the significant challenges involved in making it possible for AI agents to perform complex clinical tasks and how Nabla has been thrust into tackling a labyrinth of structural and data hurdles. These range from the integration of fragmented, unstructured patient charts and hospital guidelines to the complex technicalities of agent discoverability, interoperability, and the establishment of standardized accountability frameworks.The interview highlights a significant shift in Nabla's technical strategy: moving from probabilistic Large Language Models (LLMs) toward world models. Raison explains that while LLMs are effective at generating text, they lack a fundamental understanding of cause-and-effect and the ability to simulate evolving environments. To address this, Nabla has entered an exclusive partnership with Advanced Machine Intelligence (AMI), a research lab co-founded by Yann LeCun. This collaboration provides Nabla with early access to world model technologies that can "imagine" different scenarios and simulate the consequences of actions, providing a more deterministic and auditable path for AI in high-stakes clinical settings.In discussing the technical foundations of computational health, Martin addresses the critical need for inference optimization to manage the millions of model executions required daily at scale. Furthermore, Martin envisions a fundamental shift in the paradigm of AI inference through the adoption of world models. He suggests that these architectures will blur the traditional boundary between training and inference by enabling continuous learning, where the model adjusts and evolves in real-time based on new data and clinician feedback, rather than being limited by the static context windows of current LLMs.Beyond the core technology, Martin and Jordan discuss the critical importance of explainability and interoperability in the "agentic web" of healthcare. They specifically highlight architectural initiatives like MIT's Project NANDA, which focuses on the foundational layers of the agentic web, including critical elements like discoverability and authentication that go beyond the AI layer alone. Martin emphasizes that the sector must move toward standardized "Agent Fact Files" to ensure accountability and ease of governance as organizations begin to manage thousands of agents. He concludes by looking toward a future of "emergent intelligence," where the collaboration between multiple models creates sophisticated patterns that can eventually help clinicians improve their own professional practice over time.

Let's Talk AI
#237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

Let's Talk AI

Play Episode Listen Later Mar 16, 2026 147:19


Our 237th episode with a summary and discussion of last week's big AI news!Recorded on 03/13/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI's computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents.* ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs.* Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun's aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation.* Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection.A thank you to our current sponsors:Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a yearTimestamps:(00:00:10) Intro / Banter(00:01:23) Response to listener commentsTools & Apps(00:02:06) Perplexity's Personal Computer turns your spare Mac into an AI agent | The Verge(00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch(00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch(00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch(00:11:56) Anthropic's Claude AI can respond with charts, diagrams, and other visuals now | The VergeProjects & Open Source(00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical BlogApplications & Business(00:21:22) Nvidia halts H200 production as China backs Huawei AI chips(00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider(00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar(00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch(00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunchPolicy & Safety(00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk' Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News(00:58:15) Endogenous Resistance to Activation Steering in Language Models(01:06:27) Reasoning Models Struggle to Control their Chains of Thought(01:09:52) ‘It means missile defence on datacentres': drone strikes raise doubts over Gulf as AI superpower(01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates(01:18:24) Frontier Models Can Take Actions at Low ProbabilitiesResearch & Advancements(01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main(01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining(01:40:09) Memory Caching: RNNs with Growing Memory(01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking(01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections(02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seekingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Monde Numérique - Jérôme Colombain

Meta relance la course aux agents IA avec le rachat de Moltbook. Amazon souffre de bugs créés par l'IA. Des humains se filment pour entrainer des IA. Microsoft transforme les PC sous Windows 11 en Xbox.Avec Bruno Guglielminetti (Mon Carnet).Meta met la main sur Moltbook, le réseau social des agents IALe rachat de Moltbook, sorte de “Reddit des agents” dopé à OpenClaw, illustre l'accélération de la bataille autour de l'IA agentique. Jérôme et Bruno y voient à la fois un coup tactique face à OpenAI et un possible laboratoire grandeur nature pour observer des agents interagir entre eux (à écouter aussi : La folie des agents IA : les big tech accélèrent).Amazon : des bugs générés par IAAmazon reconnaît avoir subit des pannes causées par du code généré par intelligence artificielle.Un milliard pour AMI, le pari français de Yann LeCunAutre grand sujet de la semaine : la levée de fonds spectaculaire d'AMI, la start-up cofondée par Yann LeCun, valorisée autour de 3 à 3,5 milliards de dollars après un tour de table d'environ 1,03 milliard. L'ambition est immense : développer des “world models”, capables de comprendre le monde physique au-delà du texte, avec l'idée de dépasser les limites actuelles des LLM.A écouter aussi : 900 millions pour changer l'IA : le pari fou de Yann Le Cun.Des humains filment leur quotidien pour instruire les robotsDes travailleurs acceptent de se filmer pendant des heures pour alimenter les bases de données destinées à l'apprentissage des robots. Faire la vaisselle, ranger, manipuler des objets : autant de gestes banals qui deviennent des ressources précieuses pour une robotique encore très maladroite dans le monde réel.Xbox veut transformer les PC Windows 11 en terrain de jeu géantCap sur le jeu vidéo avec la stratégie de Microsoft autour d'un “Xbox mode” destiné à rapprocher encore davantage l'univers Xbox et les PC sous Windows 11. Pour Bruno, c'est une façon habile d'ouvrir instantanément l'écosystème Xbox à un immense parc de machines compatibles, sans remettre totalement en cause la console.Dans Mon Carnet : musique, IA, Mila et désinformationBruno annonce dans Mon Carnet une interview avec un compositeur de Disney à Paris, qui utilise de plus en plus l'IA dans son travail musical. Il évoque aussi le rôle du Mila, l'institut québécois d'IA fort de plus de 1 200 chercheurs, ainsi qu'un jeu en ligne conçu pour tester notre rapport à la désinformation.Dans Monde Numérique : agents français, cybersécurité et réalité mixteJérôme présente plusieurs entretiens à écouter dans L'Hebdo du 14 mars , notamment un agent IA capable d'opérer n'importe quel logiciel sur ordinateur, les nouveaux risques cyber liés à l'IA ; et un nouveau portrait d'innovateur, celui de Stan Larroque, Lynx Mixed Reality, créateur du seul casque européen de réalité mixte. Hébergé par Audiomeans. Visitez audiomeans.fr/politique-de-confidentialite pour plus d'informations.

This Week in Pre-IPO Stocks
E251: Cursor $50B round, lower vs secondary market; Wiz $32B acquisition closes w/ Google; Moonshot $18B valuation; Nscale $14.6B valuation; + more

This Week in Pre-IPO Stocks

Play Episode Listen Later Mar 15, 2026 25:16


Send a textInvest in pre-IPO stocks with AG Dillon & Co. Contact aaron.dillon@agdillon.com to learn more. Financial advisors only. www.agdillon.com00:00 - Intro00:47 - Cursor's $50B Round Would Cement AI Coding's New Heavyweight01:43 - Wiz Closes $32B Deal with Google; Google's Biggest Acquisition Ever02:32 - Moonshot AI's $18B Ask Shows Chinese AI Valuations Are Repricing Fast03:34 - Nscale's $14.6B Valuation Signals Europe's AI Infra Race Is Scaling Up05:06 - Legora's $5.55B Jump Marks a Breakout Moment for Legal AI06:08 - Alan's Nears $1B ARR as Europe's Digital Health Insurer Keeps Repricing Higher07:12 - Replit's $9B Valuation Shows Vibe Coding Investors Still Want More08:01 - Nebius's $2B Bet Extends Its Grip on AI Infrastructure08:52 - Gumloop's $50M Raise Says No-Code AI Agents Are Becoming Enterprise Software09:52 - Wonderful Hits $2B in Just 13 Months as Global AI Support Rollouts Accelerate10:45 - Rox Reaches Unicorn Status on an $8M ARR Sales-AI Story11:36 - Sunday $1.15B Round Shows Embodied AI (Humanoids) Is Still Attracting Premium Capital12:24 - Thinking Machines Lands 1 GW Nvidia Deal13:24 - Yann LeCun's AMI Labs Lands a Record-Scale $1.03B Seed14:11 - Forethought - AI Service Automation Platform - Acquired by Zendesk15:05 - Meta Acquires Viral AI-Agent Network Moltbook15:57 - Eridu $200M Series A Targets a Hidden Bottleneck in AI Infrastructure17:00 - Mind Robotics Raises $500M as Factory AI Becomes the Next Robotics Battleground18:03 - OpenAI Buys Promptfoo to Harden Enterprise AI Agents19:17 - Kraken Partners with Nasdaq to Push Tokenized Equities Closer to Market20:19 - Anthropic Turns Code Review Into a New Enterprise AI Revenue Lever21:25 - Anthropic, Blackstone, Hellman Strike AI Consulting JV22:13 - Lovable Hits $400M ARR With Just 146 Employees23:02 - Anduril Acquires ExoAnalytics in Defense Space Push24:25 - Anduril Lands Potentially $20B Army Deal as Revenue Heads Toward $4B

Mon Carnet, l'actu numérique
{RÉFLEXION} - Debrief Transatlantique avec Jérome Colombain

Mon Carnet, l'actu numérique

Play Episode Listen Later Mar 15, 2026 26:32


Dans ce débrief, Bruno Guglielminetti et Jérôme Colombain reviennent sur l'accélération du virage agentique dans l'industrie de l'IA, illustré par le rachat de Moldbook par Meta. Ils y voient moins un simple coup tactique qu'un accès privilégié à un laboratoire vivant où des agents IA interagissent entre eux, un terrain d'observation rare pour les grands joueurs du secteur. Le duo aborde aussi les limites bien concrètes de cette course à l'automatisation, avec Amazon qui paie le prix de son recours massif à l'IA générative, notamment dans le code, au point de devoir faire corriger ses outils par ses propres employés. Autre sujet marquant, la levée de fonds record de Mistral autour de Yann LeCun et des “World Models”, une approche qui veut donner à l'IA une compréhension plus fine du monde réel, même si ses résultats restent encore à démontrer face à la vitesse fulgurante des modèles actuels. Enfin, les deux animateurs soulignent un tournant du côté de Microsoft, qui veut transformer des millions de PC Windows 11 en machines compatibles avec l'univers Xbox, preuve que la convergence entre informatique personnelle et jeu vidéo continue de s'accélérer.

Geek News Central
Is the MacBook Neo a Chromebook Killer? #1860

Geek News Central

Play Episode Listen Later Mar 13, 2026 Transcription Available


In this episode, Chris Cochrane dives into Apple’s $599 MacBook Neo – the cheapest Mac laptop ever made – and whether it spells trouble for Chromebook makers. He also covers Samsung’s CEO blaming AI for rising phone prices, Framework raising RAM prices for the third time in three months, Meta unveiling four custom AI chips, NVIDIA’s GTC 2026 conference preview, a billion-dollar bet against large language models, Microsoft’s game-changing Project Helix Xbox with native Steam support, Windows 11’s new Xbox Mode, and SpaceX gearing up for a critical Starship Flight 12 test. – Want to start a podcast? Its easy to get started! Sign-up at Blubrry – Thinking of buying a Starlink? Use my link to support the show. Subscribe to the Newsletter. Email Chris if you want to get in touch! Like and Follow Geek News Central’s Facebook Page. Support my Show Sponsor: Best Godaddy Promo Codes Get 1Password Apple MacBook Neo The lead story covers Apple’s MacBook Neo. It launched at $599 and marks the cheapest Mac laptop ever made. The device runs on the A18 Pro chip from the iPhone 16 Pro. Cochrane notes a solid market for students, casual users, and anyone who needs a reliable home laptop. However, he advises photographers and videographers to invest in a MacBook Air or Pro instead. The real question remains whether this kills Chromebook sales in education. Samsung CEO Blames AI for Price Hikes Cochrane tackles Samsung’s Galaxy S26 price increases. CEO TM Roh blamed AI infrastructure demand for the hikes. Meanwhile, DDR4 DRAM prices surged sevenfold in a single year. Cochrane points out the irony. Samsung manufactures memory chips, shifted production toward AI data centers, and now cites that same shortage to justify higher consumer prices. He calls the situation “a little shady” but appreciates the transparency. Framework RAM Prices Up Again The RAM crisis extends beyond phones. Framework raised RAM prices for the third consecutive time in three months. Cochrane reinforces advice from a recent episode. He urges listeners to buy now before prices climb further. Analysts project peak prices by mid-2026. The shortage could last through late 2027. Sponsor: GoDaddy Economy hosting $6.99/month, WordPress hosting $12.99/month, domains $11.99. Website builder trial available. Use codes at geeknewscentral.com/godaddy to support the show. Meta Unveils Four Custom AI Chips Cochrane reports on Meta’s four new MTIA chip generations. The company aims to reduce its dependence on NVIDIA by building custom silicon. The MTIA 300 is already in production. New generations will ship every six months through 2027. The chips are built on open-source RISC-V architecture and manufactured by TSMC. NVIDIA GTC 2026 Preview NVIDIA’s GTC conference starts Monday in San Jose. Jensen Huang promises “chips the world has never seen.” Rumored architectures include Rubin Ultra and Feynman. The keynote streams free at nvidia.com on Monday at 11am Pacific. Cochrane notes that while companies like Meta are building chips to escape NVIDIA, competition will eventually catch up. Yann LeCun’s AMI Labs Raises $1.03 Billion Former Meta AI chief Yann LeCun raised $1.03 billion for AMI Labs at a $3.5 billion valuation. It marks the largest European seed round in history for a company just four months old. LeCun is building “world models” that learn from physical reality rather than text. Backers include Jeff Bezos, NVIDIA, and Samsung. Cochrane notes both approaches to AI can coexist. Microsoft Project Helix Microsoft revealed Project Helix at GDC 2026. For the first time, an Xbox will natively support Steam and GOG. Cochrane sees it as both desperate and inevitable. The only reason to buy from the Xbox store would be exclusives. He notes this is a breath of fresh air after months of talk that the Xbox era was ending. Dev kits ship in 2027 with a consumer launch likely late 2027 or 2028. Windows 11 Xbox Mode Microsoft is rolling out Xbox Mode to all Windows 11 PCs in April. The full-screen controller-optimized interface works with Steam, Epic, and Battle.net. Cochrane sees it as the first half of Microsoft’s two-phase gaming strategy. Xbox Mode trains users now. Project Helix delivers dedicated hardware later. He asks whether Sony and Nintendo will follow in Xbox’s footsteps. SpaceX Starship Flight 12 SpaceX announced stacking complete for the next Super Heavy booster at Starbase. Flight 12 targets April and debuts V3 hardware with Raptor 3 engines. Orbital refueling remains the critical unknown for NASA’s Artemis III moon landing. SpaceX has a track record of delivering eventually, just never on Elon’s original timeline. The post Is the MacBook Neo a Chromebook Killer? #1860 appeared first on Geek News Central.

Loop Infinito (by Applesfera)

Yann LeCun dejó Meta convencido de que los LLMs son un callejón sin salida. Ahora tiene mil millones de dólares para demostrarlo.Profundiza:Xataka XtraLoop Infinito, podcast de Xataka, de lunes a viernes a las 7.00 h (hora española peninsular). Presentado por Javier Lacort. Editado por Alberto de la Torre.Contacto:

Les matins
Yann Le Cun, l'AMI américain

Les matins

Play Episode Listen Later Mar 11, 2026 3:29


durée : 00:03:29 - Un monde connecté - par : François Saltiel - AMi, la nouvelle start-up du Français Yann Le Cun, vient de lever 890 millions d'euros pour atteindre une valorisation à 3 milliards d'euros. Retour sur le parcours de l'ancien directeur de la recherche de Meta.

Hacker News Recap
March 10th, 2026 | Tony Hoare has died

Hacker News Recap

Play Episode Listen Later Mar 11, 2026 14:54


This is a recap of the top 10 posts on Hacker News on March 10, 2026. This podcast was generated by wondercraft.ai (00:30): Tony Hoare has diedOriginal post: https://news.ycombinator.com/item?id=47324054&utm_source=wondercraft_ai(01:54): Online age-verification tools for child safety are surveilling adultsOriginal post: https://news.ycombinator.com/item?id=47322635&utm_source=wondercraft_ai(03:19): After outages, Amazon to make senior engineers sign off on AI-assisted changesOriginal post: https://news.ycombinator.com/item?id=47323017&utm_source=wondercraft_ai(04:44): Meta acquires MoltbookOriginal post: https://news.ycombinator.com/item?id=47323900&utm_source=wondercraft_ai(06:09): I put my whole life into a single databaseOriginal post: https://news.ycombinator.com/item?id=47321233&utm_source=wondercraft_ai(07:34): Yann LeCun raises $1B to build AI that understands the physical worldOriginal post: https://news.ycombinator.com/item?id=47320600&utm_source=wondercraft_ai(08:59): Redox OS has adopted a Certificate of Origin policy and a strict no-LLM policyOriginal post: https://news.ycombinator.com/item?id=47320661&utm_source=wondercraft_ai(10:24): Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUsOriginal post: https://news.ycombinator.com/item?id=47322887&utm_source=wondercraft_ai(11:49): Two Years of Emacs SoloOriginal post: https://news.ycombinator.com/item?id=47317616&utm_source=wondercraft_ai(13:14): Debian decides not to decide on AI-generated contributionsOriginal post: https://news.ycombinator.com/item?id=47324087&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

AI Inside
A Billion Dollar World Model Bet

AI Inside

Play Episode Listen Later Mar 11, 2026 78:04


This episode is sponsored by Airia. Get started today at ⁠⁠⁠⁠⁠⁠⁠⁠⁠airia.com⁠⁠⁠⁠⁠⁠⁠⁠⁠. Jason Howell and Jeff Jarvis invite Mike Elgan to dig into Yann LeCun's billion-dollar world models startup, Anthropic's federal lawsuit against the Department of Defense, Meta's acquisition of Moltbook and whether its AI agents were ever real, and new research on whether AI is causing worker burnout instead of relieving it. Note: Time codes subject to change depending on dynamic ad insertion by the distributor. CHAPTERS: 0:00:00 - Start 0:01:44 - Ex-Meta AI chief Yann LeCun's AMI raises $1.03 billion for alternative AI approach 0:07:49 - AMI page with mission 0:17:55 - Anthropic Sues Department of Defense Over Supply-Chain-Risk Designation 0:21:56 - OpenAI and Google Workers File Amicus Brief in Support of Anthropic Against the US Government 0:25:51 - time cover 0:32:16 - Exclusive: Meta hires duo behind Moltbook 0:33:29 - No, the Singularity Hasn't Arrived: The Truth About Moltbook 0:47:57 - Nvidia plans open-source AI agent platform ‘NemoClaw' for enterprises 0:49:59 - Copilot Cowork: A new way of getting work done 0:51:38 - When Using AI Leads to “Brain Fry” 1:02:58 - Amazon Wins Court Order Blocking Perplexity's AI Shopping Bots 1:03:56 - Judge blocks Perplexity's AI bot from shopping on Amazon in early test of agentic commerce 1:08:31 - Google rolls out new Gemini capabilities to Docs, Sheets, Slides, and Drive 1:09:54 - Grammarly will keep using authors' identities without permission unless they opt out Learn more about your ad choices. Visit megaphone.fm/adchoices

Monde Numérique - Jérôme Colombain

Le chercheur français Yann Le Cun frappe fort avec une levée de fonds record pour sa start-up parisienne Ami Labs. Son objectif : développer une nouvelle forme d'intelligence artificielle capable de comprendre réellement le monde. Mais, est-il réellement sur la bonne voie ? Une levée de fonds spectaculaire pour Ami LabsLe chercheur français Yann Le Cun, ancien directeur scientifique de l'IA chez Meta, vient de réussir une levée de fonds exceptionnelle pour sa start-up parisienne Ami Labs. L'entreprise a récolté près de 900 millions d'euros, bien au-delà des 500 millions initialement recherchés. Avec une valorisation estimée à 3 milliards d'euros, la société devient l'une des plus importantes start-ups d'intelligence artificielle françaises. Pour Yann Le Cun, il ne s'agit pas seulement d'un projet entrepreneurial, mais d'un pari scientifique majeur sur l'avenir de l'IA.Le concept des “World Models”Pour dépasser ces limites, Ami Labs travaille sur un concept appelé World Models, ou “modèles du monde”. L'idée est de développer des systèmes d'intelligence artificielle capables d'apprendre à partir de vidéos, d'environnements 3D et de données spatiales, plutôt que uniquement de textes. L'objectif est que la machine construise une représentation interne du monde physique et puisse anticiper les conséquences de ses actions.Un débat scientifique au cœur de la recherche en IACette approche ne fait toutefois pas l'unanimité dans la communauté scientifique. Certains chercheurs estiment que les modèles de langage actuels pourraient continuer à progresser grâce à des volumes de données et de calcul toujours plus importants. Selon eux, ces systèmes pourraient finir par simuler une compréhension du monde suffisamment réaliste pour rivaliser avec des approches plus complexes.L'ingénieur et auteur Aymeric Roucher évoque notamment la théorie de la “Bitter Lesson” du chercheur Richard Sutton. Cette idée suggère que les méthodes les plus simples, alimentées par d'énormes ressources de calcul et de données, finissent souvent par surpasser les approches plus sophistiquées imaginées par les chercheurs.Une nouvelle course vers l'intelligence artificielle généraleAu-delà du débat scientifique, cette initiative relance aussi la compétition internationale dans l'IA. Avec Ami Labs, Yann Le Cun tente de bâtir une alternative aux géants américains qui dominent actuellement les modèles de langage. L'entreprise se veut globale, avec des équipes à Paris mais aussi à New York, Montréal et Singapour.Hébergé par Audiomeans. Visitez audiomeans.fr/politique-de-confidentialite pour plus d'informations.

Daily Tech News Show
Yann LeCun's World Models Raise $1 Billion - DTNS 5222

Daily Tech News Show

Play Episode Listen Later Mar 10, 2026 29:27


Amazon is implementing new safeguards to protect against outages related to generated code, and Google is unifying its suite of Gemini integrations inside Google Drive.Starring Tom Merritt and Jason Howell.Links to stories found in this episode can be found here. Hosted on Acast. See acast.com/privacy for more information.

Tech&Co
Yann LeCun lance AMI Labs et lève un milliard – 10/03

Tech&Co

Play Episode Listen Later Mar 10, 2026 26:34


Mardi 10 mars, François Sorel a reçu Amélie Charnay, journaliste La Tribune, Michel Levy Provençal, prospectiviste, fondateur de TEDxParis et de l'agence Brightness, et Thomas Serval, PDG de Baracoda. Ils sont revenus sur la levée de fonds d'un milliard de dollars de la startup de Yann LeCun, et notamment les salariés de Google et d'OpenAI qui soutiennent Anthropic, dans l'émission Tech & Co, la quotidienne, sur BFM Business. Retrouvez l'émission du lundi au jeudi et réécoutez la en podcast.

Le sept neuf
AMI propose "la prochaine révolution de l'IA, qui comprend le monde réel", déclare son fondateur français Yann Le Cun

Le sept neuf

Play Episode Listen Later Mar 10, 2026 9:26


durée : 00:09:26 - L'invité de 7h50 - par : Benjamin Duhamel - Yann Le Cun, ex-directeur de la recherche fondamentale en IA de Meta, a levé 890 millions d'euros pour sa start-up AMI (Advanced Machine Intelligence), afin d'accélérer la recherche de nouveaux modèles d'IA capables de comprendre le monde physique. - invités : Yann le Cun - Yann Le Cun : Chief AI Scientist de Meta, professeur à l'université de New York Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.

Les interviews d'Inter
AMI propose "la prochaine révolution de l'IA, qui comprend le monde réel", déclare son fondateur français Yann Le Cun

Les interviews d'Inter

Play Episode Listen Later Mar 10, 2026 9:26


durée : 00:09:26 - L'invité de 7h50 - par : Benjamin Duhamel - Yann Le Cun, ex-directeur de la recherche fondamentale en IA de Meta, a levé 890 millions d'euros pour sa start-up AMI (Advanced Machine Intelligence), afin d'accélérer la recherche de nouveaux modèles d'IA capables de comprendre le monde physique. - invités : Yann le Cun - Yann Le Cun : Chief AI Scientist de Meta, professeur à l'université de New York Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.

100x Entrepreneur
The First AI Market With 8 Billion Potential Users | Sudarshan kamath, Smallest AI

100x Entrepreneur

Play Episode Listen Later Mar 6, 2026 69:25


Will smaller AI models win over large language models?Sudarshan Kamath grew up in Mumbai, taught himself AI before most Indian companies were even hiring for it, and bought the domain "smallest.ai" for $100 in 2022, two years before the company existed. Today, he runs Smallest AI, a startup focused on real time voice AI.He started with self-driving cars, training large models and compressing them to run on vehicle hardware in real time. That's where he first saw what small models could do: a hundredth of the size, almost no loss in accuracy.Two years later he put in his own $150K, got some GPUs, and started training. Eighteen months later he had a seed round, a Series A, a seven-figure enterprise deal, and a $150M acquisition offer he turned down.Most of the data that goes into large models is noise. Strip it out, train small, and you get a model that matches a giant at a fraction of the size and runs in real time. That insight is what Smallest AI is built on.00:00 – Trailer 00:51 – Sudarshan's journey before Smallest AI 05:00 – Arjun Jain & Yann LeCun 08:20 – Why build in voice AI in 2024? 15:09 – Why move the company from India to the US? 17:25 – Hiring talent via LinkedIn and X 18:49 – What large US funds actually bring to startups 21:03 – Raising a seed round with zero revenue 26:06 – Strong intros from US VCs 28:23 – What the first enterprise customer teaches you 31:50 – Raising Series A with Seligman Ventures 32:19 – The $150M acquisition offer 34:32 – When should founders sell secondaries? 36:24 – Who are Smallest AI's customers? 38:28 – What are state space models? 40:16 – Are GEPA models closer to AGI? 41:23 – Growing 10× in three months 48:03 – This is not a winner-takes-all market 49:32 – Why this is a trillion-dollar market 50:08 – Why large AI labs are not building in voice 51:26 – What it takes to reach $100M ARR 54:21 – The biggest goal for 2026 57:11 – Voice costs 1000× more than text 01:02:04 – How Smallest AI cracked large enterprises-------------India's talent has built the world's tech—now it's time to lead it.This mission goes beyond startups. It's about shifting the center of gravity in global tech to include the brilliance rising from India.What is Neon Fund?We invest in seed and early-stage founders from India and the diaspora building world-class Enterprise AI companies. We bring capital, conviction, and a community that's done it before.Subscribe for real founder stories, investor perspectives, economist breakdowns, and a behind-the-scenes look at how we're doing it all at Neon.-------------Check us out on:Website: https://neon.fund/Instagram: https://www.instagram.com/theneonshoww/LinkedIn: https://www.linkedin.com/company/beneon/Twitter: https://x.com/TheNeonShowwConnect with Siddhartha on:LinkedIn: https://www.linkedin.com/in/siddharthaahluwalia/Twitter: https://x.com/siddharthaa7-------------This video is for informational purposes only. The views expressed are those of the individuals quoted and do not constitute professional advice.Send a text

Monde Numérique - Jérôme Colombain

À l'aube du Mobile World Congress de Barcelone, Samsung et Apple lancent les hostilités avec leurs nouveaux smartphones dopés à l'IA. Pendant ce temps, une note futuriste prévoit une destruction massive des emplois à cause de l'intelligence artificielle.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Editor's note: CuspAI raised a $100m Series A in September and is rumored to have reached a unicorn valuation. They have all-star advisors from Geoff Hinton to Yann Lecun and team of deep domain experts to tackle this next frontier in AI applications.In this episode, Max Welling traces the thread connecting quantum gravity, equivariant neural networks, diffusion models, and climate-focused materials discovery (yes, there is one!!!).We begin with a provocative framing: experiments as computation. Welling describes the idea of a “physics processing unit”—a world in which digital models and physical experiments work together, with nature itself acting as a kind of processor. It's a grounded but ambitious vision of AI for science: not replacing chemists, but accelerating them.Along the way, we discuss:* Why symmetry and equivariance matter in deep learning* The tradeoff between scale and inductive bias* The deep mathematical links between diffusion models and stochastic thermodynamics* Why materials—not software—may be the real bottleneck for AI and the energy transition* What it actually takes to build an AI-driven materials platformMax reflects on moving from curiosity-driven theoretical physics (including work with Gerard ‘t Hooft) toward impact-driven research in climate and energy. The result is a conversation about convergence: physics and machine learning, digital models and laboratory experiments, long-term ambition and incremental progress.Full Video EpisodeTimestamps* 00:00:00 – The Physics Processing Unit (PPU): Nature as the Ultimate Computer* Max introduces the idea of a Physics Processing Unit — using real-world experiments as computation.* 00:00:44 – From Quantum Gravity to AI for Materials* Brandon frames Max's career arc: VAE pioneer → equivariant GNNs → materials startup founder.* 00:01:34 – Curiosity vs Impact: How His Motivation Evolved* Max explains the shift from pure theoretical curiosity to climate-driven impact.* 00:02:43 – Why CaspAI Exists: Technology as Climate Strategy* Politics struggles; technology scales. Why materials innovation became the focus.* 00:03:39 – The Thread: Physics → Symmetry → Machine Learning* How gauge symmetry, group theory, and relativity informed equivariant neural networks.* 00:06:52 – AI for Science Is Exploding (Not Emerging)* The funding surge and why AI-for-Science feels like a new industrial era.* 00:07:53 – Why Now? The Two Catalysts Behind AI for Science* Protein folding, ML force fields, and the tipping point moment.* 00:10:12 – How Engineers Can Enter AI for Science* Practical pathways: curriculum, workshops, cross-disciplinary training.* 00:11:28 – Why Materials Matter More Than Software* The argument that everything—LLMs included—rests on materials innovation.* 00:13:02 – Materials as a Search Engine* The vision: automated exploration of chemical space like querying Google.* 01:14:48 – Inside CuspAI: The Platform Architecture* Generative models + multi-scale digital twin + experiment loop.* 00:21:17 – Automating Chemistry: Human-in-the-Loop First* Start manual → modular tools → agents → increasing autonomy.* 00:25:04 – Moonshots vs Incremental Wins* Balancing lighthouse materials with paid partnerships.* 00:26:22 – Why Breakthroughs Will Still Require Humans* Automation is vertical-specific and iterative.* 00:29:01 – What Is Equivariance (In Plain English)?* Symmetry in neural networks explained with the bottle example.* 00:30:01 – Why Not Just Use Data Augmentation?* The optimization trade-off between inductive bias and data scale.* 00:31:55 – Generative AI Meets Stochastic Thermodynamics* His upcoming book and the unification of diffusion models and physics.* 00:33:44 – When the Book Drops (ICLR?)TranscriptMax: I want to think of it as what I would call a physics processing unit, like a PPU, right? Which is you have digital processing units and then you have physics processing units. So it's basically nature doing computations for you. It's the fastest computer known, as possible even. It's a bit hard to program because you have to do all these experiments. Those are quite bulky, it's like a very large thing you have to do. But in a way it is a computation and that's the way I want to see it. You can do computations in a data center and then you can ask nature to do some computations. Your interface with nature is a bit more complicated. But then these things will have to seamlessly work together to get to a new material that you're interested in.[01:00:44:14 - 01:01:34:08]Brandon: Yeah, it's a pleasure to have Max Woehling as a guest today. Max has done so much over his career that I've been so excited about. If you're in the deep learning community, you probably know Max for his work on variational autocoders, which has literally stood the test of prime or officially stood the test of prime. If you are a scientist, you probably know him for his like, binary work on graph neural networks on equivariance. And if you're a material science, you probably know him about his new startup, CASPAI. Max has a long history doing lots of cool problems. You started in quantum gravity, which is I think very different than all of these other things you worked on. The first question for AI engineers and for scientists, what is the thread in how you think about problems? What is the thread in the type of things which excite you? And how do you decide what is the next big thing you want to work on?[01:01:34:08 - 01:02:41:13]Max: So it has actually evolved a lot. In my young days, let's breathe, I would just follow what I would find super interesting. I have kind of this sensor. I think many people have, but maybe not really sort of use very much, which is like, you get this feeling about getting very excited about some problem. Like it could be, what's inside of a black hole or what's at the boundary of the universe or what are quantum mechanics actually all about. And so I follow that basically throughout my career. But I have to say that as you get older, this changes a little bit in the sense that there's a new dimension coming to it and there's this impact. Going in two-dimensional quantum gravity, you pretty much guaranteed there's going to be no impact on what you do relative, maybe a few papers, but not in this world, this energy scale. As I get closer to retirement, which is fortunately still 10 years away or so, I do want to kind of make a positive impact in the world. And I got pretty worried about climate change.[01:02:43:15 - 01:03:19:11]Max: I think politics seems to have a hard time solving it, especially these days. And so I thought better work on it from the technology side. And that's why we started CaspAI. But there's also a lot of really interesting science problems in material science. And so it's kind of combining both the impact you can make with it as well as the interesting science. So it's sort of these two dimensions, like working on things which you feel there's like, well, there's something very deep going on here. And on the other hand, trying to build tools that can actually make a real impact in the world.[01:03:19:11 - 01:03:39:23]RJ: So the thread that when I look back, look at the different things that you worked out, some of them seem pretty connected, like the physics to equivariance and, yeah, and, uh, gravitational networks, maybe. And that seems to be somewhat related to Casp. Do you have a thread through there?[01:03:39:23 - 01:06:52:16]Max: Yeah. So physics is the thread. So having done, you know, spent a lot of time in theoretical physics, I think there is first very fundamental and exciting questions, like things that haven't actually been figured out in quantum gravity. So that is really the frontier. There's also a lot of mathematical tools that you can use, right? In, for instance, in particle physics, but also in general relativity, sort of symmetry space to play an enormously important role. And this goes all the way to gauge symmetries as well. And so applying these kinds of symmetries to, uh, machine learning was actually, you know, I thought of it as a very deep and interesting mathematical problem. I did this with Taco Cohen and Taco was the main driver behind this, went all the way from just simple, like rotational symmetries all the way to gauge symmetries on spheres and stuff like that. So, and, uh, Maurice Weiler, who's also here, um, when he was a PhD student, he was a very good student with me, you know, he wrote an entire book, which I can really recommend about the role of symmetries in AI and machine learning. So I find this a very deep and interesting problem. So more recently, so I've taken a sort of different path, which is the relationship between diffusion models and that field called stochastic thermodynamics. This is basically the thermodynamics, which is a theory of equilibrium. So but then formulated for out of equilibrium systems. And it turns out that the mathematics that we use for diffusion models, but even for reinforcement learning for Schrodinger bridges for MCMC sampling has the same mathematics as this theoretical, this physical theory of non-equilibrium systems. And that got me very excited. And actually, uh, when I taught a course in, um, Mauschenberg, uh, it is South Africa, close to Cape Town at the African Institute for Mathematical Sciences Ames. And I turned that into a book site. Two years later, the book was finished. I've sent it to the publisher. And this is about the deep relationship between free energy, diffusion models, basically generative AI and stochastic thermodynamics. So it's always some kind of, I don't know, I find physics very deep. I also think a lot about quantum mechanics and it's, it's, it's a completely weird theory that actually nobody really understands. And there's a very interesting story, which is maybe good to tell to connect sort of my PZ back to where I'm now. So I did my PZ with a Nobel Laureate, Gerard the toft. He says the most brilliant man I've ever met. He was never wrong about anything as long as I've seen him. And now he says quantum mechanics is wrong and he has a new theory of quantum mechanics. Nobody understands what he's saying, even though what he's writing down is not mathematically very complex, but he's trying to address this understandability, let's say of quantum mechanics head on. And I find it very courageous and I'm completely fascinated by it. So I'm also trying to think about, okay, can I actually understand quantum mechanics in a more mundane way? So that, you know, without all the weird multiverses and collapses and stuff like that. So the physics is always been the threat and I'm trying to apply the physics to the machine learning to build better algorithms.[01:06:52:16 - 01:07:05:15]Brandon: You are still very involved in understanding and understanding physics and the worlds. Yeah. And just like applications to machine learning or introducing no formalisms. That's really cool.[01:07:05:15 - 01:07:18:02]Max: Yes, I would say I'm not contributing much to physics, but I'm contributing to the interface between physics and science. And that's called AI for science or science or AI is kind of a super, it's actually a new discipline that's emerging.[01:07:18:02 - 01:07:18:19]Speaker 5: Yeah.[01:07:18:19 - 01:07:45:14]Max: And it's not just emerging, it's exploding, I would say. That's the better term because I know you go from investments into like in the hundreds of millions now in the billions. So there's now actually a startup by Jeff Bezos that is at 6.2 billion sheep round. Right. Insane. I guess it's the largest startup ever, I think. And that's in this field, AI for science. It tells you something that we are creating a new bubble here.[01:07:46:15 - 01:07:53:28]Brandon: So why do you think it is? What has changed that has motivated people to start working on AI for science type problems?[01:07:53:28 - 01:08:49:17]Max: So there's two reasons actually. One is that people have been applying sort of the new tools from AI to the sciences, which is quite natural. And there's of course, I think there's two big examples, protein folding is a big one. And the other one is machine learning forest fields or something called machine learning inter-atomic potentials. Both of them have been actually very successful. Both also had something to do with symmetries, which is a little cool. And sort of people in the AI sciences saw an opportunity to apply the tools that they had developed beyond advertised placement, right, or multimedia applications into something that could actually make a very positive impact in society like health, drug development, materials for the energy transition, carbon capture. These are all really cool, impactful applications.[01:08:50:19 - 01:09:42:14]Max: Despite that, the science and the kind of the is also very interesting. I would say the fact that these sort of these two fields are coming together and that we're now at the point that we can actually model these things effectively and move the needle on some of these sort of science sort of methodologies is also a very unique moment, I would say. People recognize that, okay, now we're at the cusp of something new, where it results whether the company is called after. We're at the cusp of something new. And of course that always creates a lot of energy. It's like, okay, there's something, it's like sort of virgin field. It's like nobody's green field. Nobody's been there. I can rush in and I can sort of start harvesting there, right? And I think that's also what's causing a lot of sort of enthusiasm in the fields.[01:09:42:14 - 01:10:12:18]RJ: If you're an AI engineer, basically if the people that listen to this podcast will be in the field, then you maybe don't have a strong science background. How does, but are excited. Most I would say most AI practitioners, BM engineers or scientists would consider themselves scientists and they have some background, a little bit of physics, a little bit of industry college, maybe even graduate school that have been working or are starting out. How does somebody who is not a scientist on a day-to-day basis, how do they get involved?[01:10:12:18 - 01:10:14:28]Max: Well, they can read my book once it's out.[01:10:16:07 - 01:11:05:24]Max: This is basically saying that there is more, we should create curricula that are on this interface. So I'm not sure there is, also we already have some universities actual courses you can take, maybe online courses you can take. These workshops where we are now are actually very good as well. And we should probably have more tutorials before the workshop starts. Actually we've, I've kind of proposed this at some point. It's like maybe first have an hour of a tutorial so that people can get new into the field. There's a lot out there. Most of it is of course inaccessible, but I would say we will create much more books and other contents that is more accessible, including this podcast I would say. So I think it will come. And these days you can watch videos and things. There's a huge amount of content you can go and see.[01:11:05:24 - 01:11:28:28]Brandon: So maybe a follow-up to that. How do people learn and get involved? But why should they get involved? I mean, we have a lot of people who are of our audience will be interested in AI engineering, but they may be looking for bigger impacts in the world. What opportunities does AI for science provide them to make an impact to change the world? That working in this the world of pure bits would not.[01:11:28:28 - 01:11:40:06]Max: So my view is that underlying almost everything is immaterial. So we are focusing a lot on LLMs now, which is kind of the software layer.[01:11:41:06 - 01:11:56:05]Max: I would say if you think very hard, underlying everything is immaterial. So underlying an LLM is a GPU, and underlying a GPU is a wafer on which we will have to deposit materials. Do we want to wait a little bit?[01:12:02:25 - 01:12:11:06]Max: Underlying everything is immaterial. So I was saying, you know, there's the LLM underlying the LLM is a GPU on which it runs. In order to make that GPU,[01:12:12:08 - 01:12:43:20]Max: you have to put materials down on a wafer and sort of shine on it with sort of EUV light in order to etch kind of the structures in. But that's now an actual material problem, because more or less we've reached the limits of scaling things down. And now we are trying to improve further by new materials. So that's a fundamental materials problem. We need to get through the energy transition fast if we don't want to kind of mess up this world. And so there is, for instance, batteries. That's a complete materials problem. There's fuel cells.[01:12:44:23 - 01:13:01:16]Max: There is solar panels. So that they can now make solar panels with new perovskite layers on top of the silicon layers that can capture, you know, theoretically up to 50% of the light, where now we're at, I don't know, maybe 22 or something. So these are huge changes all by material innovation.[01:13:02:21 - 01:13:47:15]Max: And yeah, I think wherever you go, you know, I can probably dig deep enough and then tell you, well, actually, the very foundation of what you're doing is a material problem. And so I think it's just very nice to work on this very, very foundation. And also because I think this is maybe also something that's happening now is we can start to search through this material space. This has never been the case, right? It's like scientists, the normal way of working is you read papers and then you come up with no hypothesis. You do an experiment and you learn, et cetera. So that's a very slow process. Now we can treat this as a search engine. Like we search the internet, we now search the space of all possible molecules, not just the ones that people have made or that they're in the universe, but all of them.[01:13:48:21 - 01:14:42:01]Max: And we can make this kind of fully automated. That's the hope, right? We can just type, it becomes a tool where you type what you want and something starts spinning and some experiments get going. And then, you know, outcome list of materials and then you look at it and say, maybe not. And then you refine your query a little bit. And you kind of do research with this search engine where a huge amount of computation and experimentation is happening, you know, somewhere far away in some lab or some data center or something like this. I find this a very, very promising view of how we can sort of build a much better sort of materials layer underneath almost everything. And also more sustainable materials. Our plastics are polluting the planet. If you come up with a plastic that kind of destroys itself, you know, after, I don't a few weeks, right? And actually becomes a fertilizer. These are things that are not impossible at all. These things can be done, right? And we should do it.[01:14:42:01 - 01:14:47:23]RJ: Can you tell us a little bit just generally about CUSBI and then I have a ton of questions.[01:14:47:23 - 01:14:48:15]Speaker 5: Yeah.[01:14:48:15 - 01:17:49:10]Max: So CUSBI started about 20 months ago and it was because I was worried about I'm still worried about climate change. And so I realized that in order to get, you know, to stay within two degrees, let's say, we would not only have to reduce our emissions to zero by 2050, but then, you know, another half century or even a century of removing carbon dioxide from the atmosphere, not by reducing your emissions, but actually removing it at a rate that's about half the rate that we now emit it. And that is a unsolved problem. But if we don't solve it, two degrees is not going to happen, right? It's going to be much more. And I don't think people quite understand how bad that can be, like four degrees, like very bad. So this technology needs to be developed. And so this was my and my co-founder, Chet Edwards, motivation to start this startup. And also because, you know, we saw the technology was ready, which is also very good. So if you're, you know, the time is right to do it. And yeah, so we now in the meanwhile, we've grown to about 40 people. We've kind of collected 130 million investment into the company, which is for a European company is quite a lot. I would say it's interesting that right after that, you know, other startups got even more. So that's kind of tells you how fast this is growing. But yeah, we are we are now at the we've built the platform, of course, but it's for a series of material classes and it needs to be constantly expanded to new material classes. And it can be more automated because, you know, we know putting LLMs in as the whole thing gets more and more automated. And now we're moving to sort of high throughput experimentation. So connecting the actual platform, which is computational, to the experiments so that you can get also get fast feedback from experiments. And I kind of think of experiments as something you do at the end, although that's what we've been doing so far. I want to think of it as what I would call a sort of a physics processing unit, like a PPU, right, which is you have digital processing units and then you have physics processing units. So it's basically nature doing computations for you. It's the fastest computer known as possible, even. It's a bit hard to program because you have to do all these experiments. Those are quite, quite bulky. It's like a very large thing you have to do. But in a way, it is a computation. And that's the way I want to see it. So I want to you can do computations in a data center and then you can ask nature to do some computations. Your interface with nature is a bit more complicated. But then these things will have to seamlessly work together to get to a new material that you're interested in. And that's the vision we have. We don't say super intelligence because I don't quite know what it means and I don't want to oversell it. But I do want to automate this process and give a very powerful tool in the hands of the chemists and the material scientists.[01:17:49:10 - 01:18:01:02]Brandon: That actually brings up a question I wanted to ask you. First of all, can you talk about your platform to like whatever degree, like explain kind of how it works and like what you your thought processes was in developing it?[01:18:01:02 - 01:20:47:22]Max: Yeah, I think it's been surprisingly, it's not rocket science, I would say. It's not rocket science in the sense of the design and basically the design that, you know, I wrote down at the very beginning. It's still more or less the design, although you add things like I wasn't thinking very much about multi-scale models and as the common are rated that actually multi-scale is very important. And the beginning, I wasn't thinking very much about self-driving labs. But now I think, you know, we are now at the stage we should be adding that. And so there is sort of bits and details that we're adding. But more or less, it's what you see in the slide decks here as well, which is there is a generative component that you have to train to generate candidates. And then there is a digital twin, multi-scale, multi-fidelity digital twin, which you walk through the steps of the ladder, you know, they do the cheap things first, you weed out everything that's obviously unuseful, and then you go to more and more expensive things later. And so you narrow things down to a small number. Those go into an experiment, you know, do the experiment, get feedback, etc. Now, things that also have been more recently added is sort of more agentic sort of parts. You know, we have agents that search the literature and come up with, you know, actually the chemical literature and come up with, you know, chemical suggestions for doing experiments. We have agents which sort of autonomously orchestrate all of the computations and the experiments that need to be done. You know, they're in various stages of maturity and they can be continuously improved, I would say. And so that's basically I don't think that part. There's rocket science, but, you know, the design of that thing is not like surprising. What is it's surprising hard to actually build it. Right. So that's that's the thing that is where the moat is in the data that you can get your hands on and the and actually building the platform. And I would say there's two people in particular I want to call out, which is Felix Hunker, who is actually, you know, building the scientific part of the platform and Sandra de Maria, who is building the sort of the skate that is kind of this the MLOps part of the platform. Yeah. And so and recently we also added sort of Aaron Walsh to our team, who is a very accomplished scientist from Imperial College. We're very happy about that. He's going to be a chief science officer. And we also have a partnerships team that sort of seeks out all the customers because I think this is one thing I find very important. In print, it's so complex to do to actually bring a material to the real world that you must do this, you know, in collaboration with sort of the domain experts, which are the companies typically. So we always we only start to invest in the direction if we find a good industrial partner to go on that journey with us.[01:20:47:22 - 01:20:55:12]Brandon: Makes a lot of sense. Over the evolution of the platform, did you find that you that human intervention, human,[01:20:56:18 - 01:21:17:01]Brandon: I guess you could start out with a pure, you could imagine two directions when you start up making everything purely automatic, automated, agentic, so on. And then later on, you like find that you need to have more human input and feedback different steps. Or maybe did you start out with having human feedback? You have lots of steps and then like kind of, yeah, figure out ways to remove, you know,[01:21:17:01 - 01:22:39:18]Max: that is the second one. So you build tools for you. So it's much more modular than you think. But it's like, we need these tools for this application. We need these tools. So you build all these tools, and then you go through a workflow actually in the beginning just manually. So you put them in a first this tool, then run this to them or this with sithery. So you put them in a workflow and then you figure out, oh, actually, you know, this this porous material that we are trying to make actually collapses if you shake it a bit. Okay, then you add a new tool that says test for stability. Right. Yeah. And so there's more and more tools. And then you build the agent, which could be a Bayesian optimizer, or it could be an actual other them, you know, maybe trained to be a good chemist that will then start to use all these tools in the right way in the right order. Yeah. Right. But in the beginning, it's like you as a chemist are putting the workflow together. And then you think about, okay, how am I going to automate this? Right. For one very easy question you can ask yourself is, you know, every time somebody who is not a super expert in DFT, yeah, and he wants to do a calculation has to go to somebody who knows DFT. And so could you start to automate that away, which is like, okay, make it so user friendly, so that you actually do the right DFT for the right problem and for the right length of time, and you can actually assess whether it's a good outcome, etc. So you start to automate smaller small pieces and bigger pieces, etc. And in the end, the whole thing is automated.[01:22:39:18 - 01:22:53:25]Brandon: So your philosophy is you want to provide a set of specific tools that make it so that the scientists making decisions are better informed and less so trying to create an automated process.[01:22:53:25 - 01:23:22:01]Max: I think it's this is sort of the same where you're saying because, yes, we want to automate, yeah, but we don't see something very soon where the chemists and the domain expert is out of the loop. Yeah, but it but it's a retreat, right? It's like, okay, so first, you need an expert to tell you precisely how to set the parameters of the DFT calculation. Okay, maybe we can take that out. We can maybe automate that, right? And so increasingly, more of these things are going to be removed.[01:23:22:01 - 01:23:22:19]Speaker 5: Yeah.[01:23:22:19 - 01:24:33:25]Max: In the end, the vision is it will be a search engine where you where somebody, a chemist will type things and we'll get candidates, but the chemist will still decide what is a good material and what is not a good material out of that list, right? And so the vision of a completely dark lab, where you can close the door and you just say, just, you know, find something interesting and then it will it will just figure out what's interesting and we'll figure out, you know, it's like, oh, I found this new material to blah, blah, blah, blah, right? That's not the vision I have. He's not for, you know, a long time. So for me, it's really empowering the domain experts that are sitting in the companies and in universities to be much faster in developing their materials. And I should say, it's also good to be a little humble at times, because it is very complicated, you know, to bring it to make it and to bring it into the real world. And there are people that are doing this for the entire lives. Yeah. Right. And it's like, I wonder if they scratch their head and say, well, you know, how are you going to completely automate that away, like in the next five years? I don't think that's going to happen at all.[01:24:35:01 - 01:24:39:24]Max: Yeah. So to me, it's an increasingly powerful tool in the hands of the chemists.[01:24:39:24 - 01:25:04:02]RJ: I have a question. You've talked before about getting people interested based on having, you know, sort of a big breakthrough in materials, incremental change. I'm curious what you think about the platform you have now in are sort of stepping towards and how are you chasing the big change or is this like incremental or is there they're not mutually exclusive, obviously, but what do you think about that?[01:25:04:02 - 01:26:04:27]Max: We follow a mixed strategy. So we are definitely going after a big material. Again, we do this with a partner. I'm not going to disclose precisely what it is, but we have our own kind of long term goal. You could call it lighthouse or, you know, sort of moonshot or whatever, but it is going to be a really impactful material that we want to develop as a proof point that it can be done and that it will make it into the into the real world and that AI was essential in actually making it happen. At the same time, we also are quite happy to work with companies that have more modest goals. Like I would say one is a very deep partnership where you go on a journey with a company and that's a long term commitment together. And the other one is like somebody says, I knew I need a force field. Can you help me train this force field and then maybe analyze this particular problem for me? And I'll pay you a bunch of money for that. And then maybe after that we'll see. And that's fine too. Right. But we prefer, you know, the deep partnerships where we can really change something for the good.[01:26:04:27 - 01:26:22:02]RJ: Yeah. And do you feel like from a platform standpoint you're ready for that or what are the things that and again, not asking you to disclose proprietary secret sauce, but what are the things generally speaking that need to happen from where we are to where to get those big breakthroughs?[01:26:22:02 - 01:28:40:01]Max: What I find interesting about this field is that every time you build something, it's actually immediately useful. Right. And so unlike quantum computing, which or nuclear fusion, so you work for 20, 30, 40 years and nothing, nothing, nothing, nothing. And then it has to happen. Right. And when it happens, it's huge. So it's quite different here because every time you introduce, so you go to a customer and you say, so what do you need? Right. So we work, let's say, on a problem like a water filtration. We want to remove PFAS from water. Right. So we do this with a company, Camira. So they are a deep partner for us. Right. So we on a journey together. I think that the breakthrough will happen with a lot of human in the loop because there is the chemists who have a whole lot more knowledge of their field and it's us who will help them with training, having a new message. And in that kind of interface, these interactions, something beautiful will happen and that will have to happen first before this field will really take off, I think. And so in the sense that it's not a bubble, let's put it that way. So that's people see that as actual real what's happening. So in the beginning, it will be very, you know, with a lot of humans in the loop, I would say, and I would I would hope we will have this new sort of breakthrough material before, you know, everything is completely automated because that will take a while. And also it is very vertical specific. So it's like completely automating something for problem A, you know, you can probably achieve it, but then you'll sort of have to start over again for problem B because, you know, your experimental setup looks very different in the machines that you characterize your materials look very different. Even the models in your platform will have to be retrained and fine tuned to the new class. So every time, you know, you have a lot of learnings to transfer, but also, you know, the problems are actually different. And so, yes, I would want that breakthrough material before it's completely automated, which I think is kind of a long term vision. And I would say every time you move to something new, you'll have to start retraining and humans will have to come in again and say, okay, so what does this problem look like? And now sort of, you know, point the the machine again, you know, in the new direction and then and then use it again.[01:28:40:01 - 01:28:47:17]RJ: For the non-scientists among us, me included a bit of a scientist. There's a lot of terminology. You mentioned DFT,[01:28:49:00 - 01:29:01:11]RJ: you equivariance we've talked about. Can you sort of explain in engineering terms or the level of sophistication and engineering? Well, how what is equivariance?[01:29:01:11 - 01:29:55:01]Max: So equivariance is the infusion of symmetry in neural networks. So if I build a neural network, let's say that needs to recognize this bottle, right, and then I rotate the bottle, it will then actually have to completely start again because it has no idea that the rotated bottle. Well, actually, the input that represents a rotated bottle is actually rotated bottle. It just doesn't understand that. Right. If you build equivariance in basically once you've trained it in one orientation, it will understand it in any other orientation. So that means you need a lot less data to train these models. And these are constraints on the weights of the model. So so basically you have to constrain the way such data to understand it. And you can build it in, you can hard code it in. And yeah, this the symmetry groups can be, you know, translations, rotations, but also permutations. I can graph neural network, their permutations and then physics, of course, as many more of these groups.[01:29:55:01 - 01:30:01:08]RJ: To pray devil's advocate, why not just use data augmentation by your bottle is in all the different orientations?[01:30:01:08 - 01:30:58:23]Max: As an option, it's just not exact. It's like, why would you go through the work of doing all that? Where you would really need an infinite number of augmentations to get it completely right. Where you can also hard code it in. Now, I have to say sometimes actually data augmentation works even better than hard coding the equivariance in. And this is something to do with the fact that if you constrain the optimization, the weights before the optimization starts, the optimization surface or objective becomes more complicated. And so it's harder to find good minima. So there is also a complicated interplay, I think, between the optimization process and these constraints you put in your network. And so, yeah, you'll hear kind of contradicting claims in this field. Like some people and for certain applications, it works just better than not doing it. And sometimes you hear other people, if you have a lot of data and you can do data augmentation, then actually it's easier to optimize them and it actually works better than putting the equivariance in.[01:30:58:23 - 01:31:07:16]Brandon: Do you think there's kind of a bitter lesson for mathematically founded models and strategies for doing deep learning?[01:31:07:16 - 01:31:46:06]Max: Yeah, ultimately it's a trade-off between data and inductive bias. So if your inductive bias is not perfectly correct, you have to be careful because you put a ceiling to what you can do. But if you know the symmetry is there, it's hard to imagine there isn't a way to actually leverage it. But yeah, so there is a bitter lesson. And one of the bitter lessons is you should always make sure your architecture is scale, unless you have a tiny data set, in which case it doesn't matter. But if you, you know, the same bitter lessons or lessons that you can draw in LLM space are eventually going to be true in this space as well, I think.[01:31:47:10 - 01:31:55:01]RJ: Can you talk a little bit about your upcoming book and tell the listeners, like, what's exciting about it? Yeah, I should read it.[01:31:55:01 - 01:33:42:20]Max: So this book is about, it's called Generative AI and Stochastic Thermodynamics. It basically lays bare the fact that the mathematics that goes into both generative AI, which is the technology to generate images and videos, and this field of non-equilibrium statistical mechanics, which are systems of molecules that are just moving around and relaxing to the ground state, or that you can control to have certain, you know, be in a certain state, the mathematics of these two is actually identical. And so that's fascinating. And in fact, what's interesting is that Jeff Hinton and Radford Neal already wrote down the variational free energy for machine learning a long time ago. And there's also Carl Friston's work on free energy principle and active entrance. But now we've related it to this very new field in physics, which is called stochastic thermodynamics or non-equilibrium thermodynamics, which has its own very interesting theorems, like fluctuation theorems, which we don't typically talk about, but we can learn a lot from. And I think it's just it can sort of now start to cross fertilize. When we see that these things are actually the same, we can, like we did for symmetries, we can now look at this new theory that's out there, developed by these very smart physicists, and say, okay, what can we take from here that will make our algorithms better? At the same time, we can use our models to now help the scientists do better science. And so it becomes a beautiful cross-fertilization between these two fields. The book is rather technical, I would say. And it takes all sorts of things that have been done as stochastic thermodynamics, and all sorts of models that have been done in the machine learning literature, and it basically equates them to each other. And I think hopefully that sense of unification will be revealing to people.[01:33:42:20 - 01:33:44:05]RJ: Wait, and when is it out?[01:33:44:05 - 01:33:56:09]Max: Well, it depends on the publisher now. But I hope in April, I'm going to give a keynote at ICLR. And it would be very nice if they have this book in my hand. But you know, it's hard to control these kind of timelines.[01:33:56:09 - 01:33:58:19]RJ: Yeah, I'm looking forward to it. Great.[01:33:58:19 - 01:33:59:25]Max: Thank you very much. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

Crazy Wisdom
Episode #533: The Universe Doing Its Thing: AI Evolution Is Already Here

Crazy Wisdom

Play Episode Listen Later Feb 20, 2026 73:51


In this episode of the Crazy Wisdom podcast, host Stewart Alsop sits down with Markus Buehler, the McAfee Professor of Engineering at MIT, to explore how seemingly different systems—from proteins and music to knowledge structures and AI reasoning—share underlying patterns through hierarchy, self-organization, and scale-free networks. The conversation ranges from the limits of current AI interpolation versus true discovery (using the fire-to-fusion example), to the emergence of agent swarms and their non-linear effects, to practical questions about ontologies, knowledge graphs, and whether humans will remain necessary in the creative discovery process. Markus discusses his lab's work automating scientific discovery through AI agents that can generate hypotheses, run simulations, and even retrain themselves, while Stewart shares his own experiences building applications with AI coding agents and grapples with questions about intellectual property, material science constraints, and the future of human creativity in an AI-abundant world.Timestamps00:00 - Introduction to Marcus Buehler's work on knowledge graphs, structural grammar across proteins, music, and AI reasoning05:00 - Discussion of AI discovery versus interpolation, using fire and fusion as examples of fundamental versus incremental innovation10:00 - Language models as connective glue between agents, enabling communication despite imperfect outputs and canonical averaging15:00 - Embodiment and agency in AI systems, creating adversarial agents that challenge theories and expand world models20:00 - Emergent properties in materials and AI, comparing dislocations in metals to behaviors in agent swarms25:00 - Human role-playing and phase separation in society, parallels to composite materials and heterogeneity30:00 - Physical world challenges, atom-by-atom manufacturing at MIT.nano, limitations of lithography machines35:00 - Synthetic biology as alternative to nanotechnology, programming microorganisms for materials discovery40:00 - Intellectual property debates, commodification of AI models, control layers more valuable than model architecture45:00 - Automation of ontologies, agent self-testing, daughter's coding success at age 1150:00 - Graph theory for knowledge compression, neurosymbolic approaches combining symbolic and neural methods55:00 - Nonlinear acceleration in AI, emergence from accumulated innovations, restaurant owner embracing AI01:00:00 - Future generations possibly rejecting AI, democratization of knowledge, social media as real-time scientific discourseKey Insights1. Universal Patterns Across Disciplines: Seemingly different systems in nature—proteins, music, social networks, and knowledge itself—share fundamental structural patterns including hierarchy, self-organization, and scale-free networks. This commonality allows creative thinkers to draw insights across disciplines, applying principles from one domain to solve problems in another. As an engineer and materials scientist, Buehler has leveraged these isomorphisms to advance scientific understanding by mapping the "plumbing" of different systems onto each other, revealing hidden relationships that enable extrapolation beyond what's observable in any single domain.2. The Discovery Versus Interpolation Problem: Current AI systems, particularly large language models, excel at interpolation—recombining existing knowledge in new ways—but struggle with genuine discovery that requires fundamental rewiring of world models. Using the example of fire versus fusion, Buehler explains that an AI trained on combustion chemistry would propose bigger fires or new fuels, but couldn't conceive of fusion because that requires stepping back to more fundamental physics. True discovery demands the ability to recognize when existing theories have boundaries and to develop entirely new frameworks, something current AI architectures aren't designed to achieve due to their training objective of predicting the most likely outcome.3. The Role of Ontologies and Knowledge Graphs: While some AI researchers argue that ontologies are unnecessary because models form internal representations, Buehler advocates for explicit knowledge graphs as essential discovery tools. External ontologies provide sharp, analytical, symbolic representations that complement the fuzzy internal representations of neural networks. They enable verification of rare connections—like obscure papers that might hold key insights—which would be averaged away in standard AI training. This neurosymbolic approach combines the generalization capabilities of neural networks with the precision of formal knowledge structures, creating more powerful discovery systems.4. Emergent Properties and Agent Swarms: Just as materials science shows that collections of atoms exhibit properties impossible to predict from individual components, AI agent swarms demonstrate emergent behaviors beyond single models. When agents are incentivized not just to answer questions but to challenge each other adversarially, propose theories, and test hypotheses, they can spawn new copies of themselves and evolve understanding beyond their initial programming. This emergence isn't surprising from a materials science perspective—dislocations, grain boundaries, and other collective phenomena only appear at scale, fundamentally determining material behavior in ways unpredictable from studying just a few atoms.5. The Commoditization of Intelligence: The fundamental AI models themselves are becoming commodities, as evidenced by events like the Moldbug phenomenon where people built agents using various providers interchangeably. The real value is shifting from who has the smartest model to how models are orchestrated, integrated, and deployed. This parallels historical technology adoption patterns—just as we moved past debating who makes the best electricity to focusing on applications, AI is transitioning from a horse race over model capabilities to questions of infrastructure, energy, access speed, and agent coordination at the systems level.6. Human-AI Collaboration and Creative Control: Rather than wholesale replacement, AI enables humans to operate in an intensely creative space as orchestrators sampling from vast possibility spaces. Similar to how Buehler's 11-year-old daughter now builds sophisticated applications that would have required professional developers years ago, AI democratizes access to capabilities while humans retain the creative judgment about direction and meaning. The human role becomes curating emergence, finding rare connections, playing at the edges of knowledge, and exercising the kind of curiosity-driven exploration that AI systems lack without embodied stakes in their own survival and continuation.7. Technology as Evolutionary Inevitability: The development of AI represents not an unnatural threat but the next stage of human evolution—an extension of our innate drive to build models of ourselves and our world. From cave paintings to partial differential equations to artificial intelligence, humans continuously create increasingly sophisticated representations and tools. Attempting to stop this technological evolution is futile; instead, the focus should be on steering it ...

Scrum Master Toolbox Podcast
BONUS The Future of Seeing—Why AI Vision Will Transform Medicine and Human Perception With Daniel Sodickson

Scrum Master Toolbox Podcast

Play Episode Listen Later Feb 19, 2026 37:18


BONUS: The Future of Seeing—Why AI Vision Will Transform Medicine and Human Perception What if the next leap in AI isn't about thinking, but about seeing? In this episode, Daniel Sodickson—physicist, medical imaging pioneer, and author of "The Future of Seeing"—argues we're on the edge of a vision revolution that will change medicine, technology, and even human perception itself. From Napkin Sketch to Parallel Imaging "I was doodling literally on a napkin in a piano bar in Boston and came up with a way to get multiple lines at once. I ran to my mentor and said, 'Hey, I have this idea, never mind my paper.' And he said, 'Who are you again? Sure, why not.' And it worked."   Daniel's journey into imaging began with a happy accident. While studying why MRI couldn't capture the beating heart fast enough, he realized the fundamental bottleneck: MRI machines scan one line at a time, like old CRT screens. His insight—imaging in parallel to capture multiple lines simultaneously—revolutionized the field. This connection between natural vision (our eyes capture entire scenes at once) and artificial imaging systems set him on a 29-year journey exploring how we can see what was once invisible. Upstream AI: Changing What We Measure "Most often when we envision AI, we think of it as this downstream process. We generate our data, make our image, then let AI loose instead of our brains. To me, that's limited. Why aren't we thinking of tasks that AI can do that no human could ever do?"   Daniel introduces a crucial distinction between "downstream" and "upstream" AI. Downstream AI takes existing images and interprets them—essentially competing with human experts. Upstream AI changes the game entirely by redesigning what data we gather in the first place. If we know a machine learning system will process the output, we can build cheaper, more accessible sensors. Imagine monitoring devices built into beds or chairs that don't produce perfect images but can detect whether you've changed since your last comprehensive scan. AI fills in the gaps using learned context about how bodies and signals behave. The Power of Context and Memory "The world we see is a lie. Two eyes are not nearly enough to figure out exactly where everything is in space. What the brain is doing is using everything it's learned about the world—how light falls on surfaces, how big people are compared to objects—and filling in what's missing."   Our brains don't passively receive images; they actively construct reality using massive amounts of learned context. Daniel argues we can give imaging machines the same superpower. By training AI on temporal patterns—how healthy bodies change over time, what signals precede disease—we create systems with "memory" that can make sophisticated judgments from incomplete data. Today's signal, combined with your history and learned patterns from millions of others, becomes far more informative than any single pristine image could be. From Reactive to Proactive Health "I've started to wonder why we use these amazing MRI machines only once we already know you're sick. Why do we use them reactively rather than proactively?"   This question drove Daniel to leave academia after 29 years and join Function Health, a company focused on proactive imaging and testing to catch disease before it develops. The vision: a GPS for your health. By combining regular blood panels, MRI scans, and wearable data, AI can monitor whether you look like yourself or have changed in worrisome ways. The goal isn't replacing expert diagnosis but creating an early warning system that surfaces problems while they're still easily treatable. Seeing How We See "Sometimes when I'm walking along, everything I'm seeing just fades away. And what I see instead is how I'm seeing. I imagine light bouncing off of things and landing in my eye, this buzz of light zipping around as fast as anything in the universe can go."   After decades studying vision, Daniel experiences the world differently. He finds himself deconstructing his own perception—tracing sight lines, marveling at how we've evolved to turn chaos of sensation into spatially organized information. This meta-awareness extends to his work: every new imaging modality has driven scientific discovery, from telescopes enabling the Copernican Revolution to MRI revealing the living body. We're now at another inflection point where AI doesn't just interpret images but transforms our relationship with perception itself.   In this episode, we refer to An Immense World: How Animal Senses Reveal the Hidden Realms Around Us by Ed Young on animal perception, and A Path Towards Autonomous Machine Intelligence by Yann LeCun on building AI more like the brain.   About Daniel Sodickson Daniel K. Sodickson is a physicist in medicine and chief medical scientist at Function Health. Previously at NYU, and a gold medalist and past president of the International Society for Magnetic Resonance in Medicine, he pioneers AI-driven imaging and is author of The Future of Seeing.

ai power vision future medicine transform gps perception context nyu mri crt international society yann lecun ed young copernican revolution magnetic resonance hidden realms around us immense world how animal senses reveal
Business Pants
CEOs on ICE, the SEC kills small investors, the manchild economy, and AI navel gazers

Business Pants

Play Episode Listen Later Jan 30, 2026 62:36


Story of the Week (DR):Trump's ICE tactics force CEOs to choose between staying silent and risking White House backlash MMCEOs of Target and Minnesota's Biggest Companies Call for ‘De-Escalation' After ShootingMinnesota workers pressure employers to take action against ICE operationsCEOs, long silent on Trump's immigration crackdown, seem to hit their breaking point over killing of Alex Pretti in MinnesotaTarget's incoming CEO tells staff violence in Minneapolis is 'incredibly painful' – without naming Trump or ICEJan 28: Target Unveils Largest Spring Beauty Assortment Ever — Making Trend-Driven, Expert-Backed Beauty More AccessibleTech's top CEOs mum after ICE killings, while leaders like Reid Hoffman, Yann LeCun speak outICE is going too far': Sam Altman, Jamie Dimon, and more CEOs on the unrest in MinnesotaReid Hoffman says business leaders are wrong to stay silent about the Trump administrationApple's Cook says he's 'heartbroken' by Minneapolis events and has spoken with TrumpCompanies reap $22bn from Trump's immigration crackdownMeta blocks links to ICE List across Facebook, Instagram, and ThreadsAs Big Tech CEOs speak up about violence in Minneapolis, 1 in 3 corporate leaders think ICE tensions are ‘not relevant to their business'How ICE Already Knows Who Minneapolis Protesters AreAgents use facial recognition, social media monitoring and other tech tools not only to identify undocumented immigrants but also to track protesters, current and former officials said.Freefloatanalytics data blast:Palantir Technologies: Continues to be a primary partner. In 2025, they were awarded a $30 million contract to build "ImmigrationOS," a platform designed to provide "near real-time visibility" on individuals for the purpose of streamlining apprehensions and tracking self-deportations. Gender Influence Gap -26%RELX: LexisNexis Risk Solutions: Provides ICE with investigative databases used to track, vet, and target individuals. Their current contract is valued at over $22 million. Gender Influence Gap -24%Thomson Reuters: Supplies ICE with access to massive databases, including over 20 billion license plate scans. This data allows agents to track vehicle movement history and identify where individuals may be living or working. Gender Influence Gap -28%Clearview AI: Recently signed a $3.75 million contract (September 2025) to provide facial recognition technology. While officially limited to certain types of investigations, procurement records suggest its use is expanding. Gender Influence Infinity% (no women on advisory board; Hal Lambert and Richard Schwartz as co-CEOs)King “Bumps”JPMorgan's Dimon sees 10.3% pay bump to $43MDisney CEO Bob Iger's Pay Increased 11.5% to $45.8 Million in 2025Goldman Sachs hikes CEO David Solomon's pay 21% to record $47 millionWells Fargo CEO Charlie Scharf Gets 28% Pay Boost to $40 MillionWhy Starbucks is letting Brian Niccol use the company plane for more personal travel“Following a security review of risks, the Starbucks board of directors made the decision to enhance security measures for Brian,” a company spokesperson said. “This included a decision by the board to require Brian to use private aircraft for all travel.”$96M in 2024; $31M in 2024, including temporary housing expenses in the amount of $371,536; and security expenses in the amount of $1,142,700; and $997,392 in expenses related to his use of Starbucks aircraft for commuting and personal usemedian employee: $17,279. CEO Pay ratio 1,794 to 1 (January 1st: 10:10am)Temporary housing expense ratio: 22:1The docu-bribe: At ‘Melania' Premiere, the President Sees ‘Glamour' and Others See GraftAmazon paid Melania Trump's production company $40 million for the movie and then paid another $35 million to promote it.Guests included:Jordan Belfort: The real-life "Wolf of Wall Street."Director Brett Ratner, accused of rape, sexual assault, sexual harrassment, and homophobic abuse by at least 9 women:Melania Trump documentary marks a post-#MeToo comeback for its directorBrett Ratner was all but exiled from Hollywood after facing sexual misconduct allegations. Trump's win gave him an opening to return.Tim Cook (Apple)Andy Jassy (Amazon)Lisa Su (AMD)Eric Yuan (Zoom)Lynn Martin (President of the NYSE)Larry Culp (GE)Sam Altman (OpenAISatya Nadella (Microsoft)Sundar Pichai (Google)Safra Catz (Oracle):David Brown (Victory Capital)David Ellison (Skydance/Paramount)Marc Benioff (Salesforce)Goodliest of the Week (MM/DR):DR: Diversity on Fortune 50 boards: white men haven't been a majority for 3 years in a rowWhereas about a decade ago, white men held two-thirds of the seats on the top 50 Fortune boards, in 2023, for the first time, they held fewer than 50%. In 2024, that number dropped to 48.4%, but this year it climbed back to 49.7%.Since white men make up about 31% of the U.S. population, they still have been very much overrepresented in all three years.DR: National Shutdown: General strike on January 30 aims to push ICE out of Minnesota. Stores closed, protests scheduled in all 50 statesMM: Delivery Robot Gets Stuck on Train Tracks, Gets Obliterated by LocomotiveMM: Judge greenlights Massachusetts offshore wind project halted by Trump administrationVineyard Wind, which joins Revolution Wind, Empire Wind, and Coastal Virginia Offshore Wind in restarted because lawsAssholiest of the Week (MM):WHICH ASSHOLE DO YOU BLAME: Trump's ICE tactics force CEOs to choose between staying silent and risking White House backlashTrump/ICEHis personal military got orders to be “ethical”, but to fuck up everyone - and recruited specifically targeting Call of Duty players and lonely, angry men who wish they could call their friends “retarded” again but it isn't politically correctPalantir and the ICE industrial complexAlex Karp went out of his way to insist to his disgusted employees that AI and Palantir “bolsters civil liberties”Meanwhile, Palantir employees signed a letter from tech employees pondering whether or not they are actively destroying our country and abetting oligarchsBut Palantir, while making some of the creepiest, most heinous software known to man (I mean, worse than CHINA! And we all HATE CHINA, RIGHT???), has $100m in contracts with ICEIn fact, there's a whole private infrastructure complex that's largely not politically agnostic that's made $22bn from ICE and immigration crackdowns - and it's only been a year! That's some awesome shareholder value illegally sending weeping mothers to countries they don't live in with no due process!CEOs (Target, looking at you) DRThey managed to find a pen and craft a strongly worded letter that asked, pretty please, for “de-escalation”, calling ICE out not by NAME of course, but as a “recent challenge” that created “widespread disruption” - and named the White House only as someone they are “communicating” with. Signed by 60 Minnesota CEOs, co-signed in spirit by the Business Roundtable (though not like, officially), they managed to write a whole 199 words about the execution of a VA nurse whose crime was filming the Gestapo in actionTarget's incoming CEO (obviously not the CURRENT CEO Brian Cornell, he's busy polishing his mahogany chair for board meetings where he will be Executive Chair, making as much as a CEO with none of the responsibilities) also addressed the unlawful and unwarranted arrests of Target employees in Minneapolis by thugs - oh, wait, no he didn't - he said, “The violence and loss of life in our community is incredibly painful.” - IT WAS YOUR EMPLOYEES IN THE CROSSHAIRS, SCHMUCK. Target employees are currently skipping work in Minnesota, but solid leadership.Boards of directorsOur analysis of the boards of the Minnesota 60 showed that nearly half of them sit on each other's boards. Basically, you have a massive groupcoward problem - about 25 of the CEOs sit on some other CEOs board or overlap in some way, and the lawyers that carefully crafted the letter absolutely had to have it run through every other board and company lawyer, a task made easier when half of you are on the board with each other. No need for authenticity when you have collective ass covering.Jeffrey EpsteinIf not for those files, there wouldn't NEED TO BE MURDERS so you look somewhere else!InvestorsIf not for “shareholder value”, we could pay attention to humanity and authentic real world values!WHICH ASSHOLE DO YOU BLAME: As You Sow leads criticism of SEC's updated restrictions on smaller shareholdersSmaller investors!For three decades, small investors have used precatory proposals either as a means to extract more data, a means to improve governance, or a means of advertising - many of the non profits use it as a fundraising tool as much as a means of changeMeanwhile, those proposals have almost entirely failed at the vote - though they HAVE succeeded in increasing our data over time (the long arc of disclosure)Then the zone gets flooded by the anti-woke shareholders looking to de-trans companies, and now we have a massive influx of performative proposalsNow that the insiders are in charge (vs. career bureaucrats), in a six month period, virtually all rights have been revoked with threats of paperwork for non complianceAs a final cherry, they are now trying to keep EXEMPT SOLICITATIONS off the filing docket unless you have $5m in stock, so you can't even file your intent to vote directionally unless you're super richJohn CheveddenThe gadflyfather - if not for being the winningest shareholder in history with a nearly obsessive focus on improving shareholder rights, the most boring of topics, the SEC would probably have ignored the whole thingBut the data shows the SEC is taking the time to blanket ignore everyone BUT Chevedden, responding to affirmatively say no to his proposalsJC, no one likes a repeat champion dynastyThe SECBrain Daly at the SEC is out there suggesting maybe NO ONE should vote proxies while SEC Chair Atkins tried to gaslight the entire investment community by claiming the “government shutdown” made it too hard for the poor ole SEC to do its job, so they just gave companies immunity from proposals in lieu of doing their jobsMeanwhile, Atkins has overseen a steep drop in enforcement of accounting irregularities and reporting while simultaneously green lighting crypto scams and Exxon's new “retail vote” capture plan (which gives management anywhere from 5-20% of the company vote depending on the company by auto voting retail that opts in)All with Trump family in the backdrop raking in 1.4bn in the first year of the presidency from crypto token bullshit, asset seizures and sales, and pure graft - none of which will obviously be investigated despite Trump's son actively on a public board of directorsBigger investors!THEY NEVER REALLY CARED ABOUT VOTING ANYWAY! 96% average support for directors, 0.2% of directors globally voted out annually, and of those that are voted out (~20 a year), MORE THAN HALF STAY ON THE BOARD either by bylaw (cumulative voting) or as zombies (Jay Hoag!)And still, NO ONE CARES!WHICH ASSHOLE DO YOU BLAME: Marc Andreessen says the real crisis isn't AI job losses — it's what would have happened without AIThe powerless AI makersSam Altman: Sam Altman Says AI Will Cause Massive Deflation, Making Money Worth Vastly More - that's pretty good if you're already a billionaire, yeah?Dario Amodei: Anthropic CEO Warns That the AI Tech He's Creating Could Ravage Human Civilization - uh, don't create itThe CEO of Microsoft Suddenly Sounds Extremely Nervous About AIAI anxiety is so widespread that veteran Microsoft researchers are having panic attacks because they're making themselves obsoleteThe VC Navel Gazing Manchild EconomyAndreessen's genius was investing in manchildren: Facebook, Roblox, AirBnBVCs actually are giving LESS MONEY to women than the INCREDIBLY LOW AMOUNT they already gave during the AI raceYOU - you should have been a plumber or a peasant or a construction workerHeadliniest of the WeekDR: Cracker Barrel Wants Its Staff to Eat One Thing on Work Trips: Cracker BarrelMM: The company Americans say is the best place to work in 2026 isn't who you thinkCrew Carwash - washing cars is better than tech bro manbaby festsMM: The Worst People Alive Are Obsessed With Meta's Video Recording GlassesWho Won the Week?DR: Resistance in Minnesota and Maine (I'm attempting to be optimistic here, give me a break)MM: 33% of corporate leaders: As Big Tech CEOs speak up about violence in Minneapolis, 1 in 3 corporate leaders think ICE tensions are ‘not relevant to their business'PredictionsDR: January 1st will officially be recognized by the Business Roundtable as "Equality Day"—celebrating the grueling minutes it takes a CEO to earn more than their average worker for the year. Engraved badges with the exact time (10:10 for SBUX) will be created to honor the achievement.Ok, maybe that's silly, my real one is that Target announces its "De-Escalation" Collection: a "Minneapolis-Inspired" line of high-fashion neutral-tone hoodies, specifically marketed as "non-threatening" to ICE agents and heartbroken CEOsMM: Alex Karp, social justice warrior out for the little guy, mass fires his staff at Palantir and replaces it with an AI robot named “The Job Displacer”, does a road show claiming he's “freed” his employees using AI and now they can really have authentic jobs like “bagger at grocery store” and “guy who mixes paint”

Crazy Wisdom
Episode #524: The 500-Year Prophecy: Why Buddhism and AI Are Colliding Right Now

Crazy Wisdom

Play Episode Listen Later Jan 19, 2026 60:49


In this episode of the Crazy Wisdom podcast, host Stewart Alsop sits down with Kelvin Lwin for their second conversation exploring the fascinating intersection of AI and Buddhist cosmology. Lwin brings his unique perspective as both a technologist with deep Silicon Valley experience and a serious meditation practitioner who's spent decades studying Buddhist philosophy. Together, they examine how AI development fits into ancient spiritual prophecies, discuss the dangerous allure of LLMs as potentially "asura weapons" that can mislead users, and explore verification methods for enlightenment claims in our modern digital age. The conversation ranges from technical discussions about the need for better AI compilers and world models to profound questions about humanity's role in what Lwin sees as an inevitable technological crucible that will determine our collective spiritual evolution. For more information about Kelvin's work on attention training and AI, visit his website at alin.ai. You can also join Kelvin for live meditation sessions twice daily on Clubhouse at clubhouse.com/house/neowise.Timestamps00:00 Exploring AI and Spirituality05:56 The Quest for Enlightenment Verification11:58 AI's Impact on Spirituality and Reality17:51 The 500-Year Prophecy of Buddhism23:36 The Future of AI and Business Innovation32:15 Exploring Language and Communication34:54 Programming Languages and Human Interaction36:23 AI and the Crucible of Change39:20 World Models and Physical AI41:27 The Role of Ontologies in AI44:25 The Asura and Deva: A Battle for Supremacy48:15 The Future of Humanity and AI51:08 Persuasion and the Power of LLMs55:29 Navigating the New Age of TechnologyKey Insights1. The Rarity of Polymath AI-Spirituality Perspectives: Kelvin argues that very few people are approaching AI through spiritual frameworks because it requires being a polymath with deep knowledge across multiple domains. Most people specialize in one field, and combining AI expertise with Buddhist cosmology requires significant time, resources, and academic background that few possess.2. Traditional Enlightenment Verification vs. Modern Claims: There are established methods for verifying enlightenment claims in Buddhist traditions, including adherence to the five precepts and overcoming hell rebirth through karmic resolution. Many modern Western practitioners claiming enlightenment fail these traditional tests, often changing the criteria when they can't meet the original requirements.3. The 500-Year Buddhist Prophecy and Current Timing: We are approximately 60 years into a prophesied 500-year period where enlightenment becomes possible again. This "startup phase of Buddhism revival" coincides with technological developments like the internet and AI, which are seen as integral to this spiritual renaissance rather than obstacles to it.4. LLMs as UI Solution, Not Reasoning Engine: While LLMs have solved the user interface problem of capturing human intent, they fundamentally cannot reason or make decisions due to their token-based architecture. The technology works well enough to create illusion of capability, leading people down an asymptotic path away from true solutions.5. The Need for New Programming Paradigms: Current AI development caters too much to human cognitive limitations through familiar programming structures. True advancement requires moving beyond human-readable code toward agent-generated languages that prioritize efficiency over human comprehension, similar to how compilers already translate high-level code.6. AI as Asura Weapon in Spiritual Warfare: From Buddhist cosmological perspective, AI represents an asura (demon-realm) tool that appears helpful but is fundamentally wasteful and disruptive to human consciousness. Humanity exists as the battleground between divine and demonic forces, with AI serving as a weapon that both sides employ in this cosmic conflict.7. 2029 as Critical Convergence Point: Multiple technological and spiritual trends point toward 2029 as when various systems will reach breaking points, forcing humanity to either transcend current limitations or be consumed by them. This timing aligns with both technological development curves and spiritual prophecies about transformation periods.

The Futurists
Why AI Needs JEPA World Models

The Futurists

Play Episode Listen Later Jan 8, 2026 51:38


The Futurists starts 2026 with a stimulating conversation with serial entrepreneur Matt Miesnieks, a true pioneer of AR/XR and spatial computing. In his new startup venture, Primate AI, Matt is focused on a novel approach to artificial intelligence. He intends to construct spatial and dimensional concepts that replicate the way humans develop a mental model of the real world. Topics in this episode: how the limitations of LLMs create opportunities for new approaches, such as Yann LeCun's JEPA (Joint Embedding Predictive Architecture); the distinction between trying to understand the real world and trying to generate new worlds; why it is so hard to get a robot to cross a busy street safely; why 3D world models are needed; what happens when the real world is machine-readable.

This Week in Startups
2026 Starts with a bang: META AI Drama and Nvidia's $20B Groq Acquisition | E2230

This Week in Startups

Play Episode Listen Later Jan 6, 2026 54:39


This Week In Startups is made possible by:Crusoe Cloud - https://crusoe.ai/buildUber - http://uber.com/twistEvery.io - http://every.io/Today's show: Jason and Alex are BACK on TWiST for 2026! This holiday season was anything but calm, with deca-corn acquisitions, massive Polymarket bets, and major new startups breaking from stealth!Jason talks the recent Nvidia-Groq $20B acquisition, a major exit for Chamath as the lead investor back in 2017! Jason delves into how the VC fund math shapes out for pre-seed VC funds vs. Series A VC funds.Jason and Alex delve into drama swirling META's AI team. Yann LeCun, META's former Chief AI Scientist, announced that he would be leaving META to become Executive Chairman at AMI Labs. LeCun left the META team in the new year, calling the new Chief AI Scientist, Alexandr Wang, inexperienced. LeCun now looks to move AI beyond the era of LLM at AMI Labs.PLUS Jason and Alex talk about the new social media app Tangle, from Biz Stone, co-founder of Twitter, and Evan Sharp, co-founder of Pinterest. Their Startup, West Co, launched tangle, which seeks to become an “intentional living” app. The two look to improve how humans interact with modern tech. Jason points out that very few news products have worked, but is eager to see how two industry veterans build in the space. Timestamps:(00:00) Why Restaurants are OVER — Peptides and other self medications(06:41) Nvidia Acqui-Hires Groq for $20 BILLION(9:48) Crusoe Cloud: Crusoe is the AI factory company. Reliable infrastructure and expert support. Visit https://crusoe.ai/build to reserve your capacity for the latest GPUs today.(11:00) The VC fund math between seed vs. Series A funds(15:00) META buys TWiST 500 Company, Manus! Why it matters.(20:20) Uber AI Solutions: Your trusted partner to get AI to work in the real world. Book a demo with them TODAY at http://uber.com/twist(21:24) Why Yann LeCun left META, and what could be behind it(25:27) Producer Claude on the Gondola Crash in Zurich(29:13) Jason's Request for Augmented human intelligence(30:11) Every.io - For all of your incorporation, banking, payroll, benefits, accounting, taxes or other back-office administration needs, visit http://every.io/(32:04) How one Trader made $436.8k on one bet on polymarket!(36:05) Jason's Predictions for 2026 IPOs(40:01) Is news broken? How Tangle is tackling it.(45:53) How much should startup incur in legal expenses? Should founders try to use AI to avoid costs?(50:59) Why Google should let NotebookLM cook, make it a standalone brand! *Subscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.com/Check out the TWIST500: https://twist500.comSubscribe to This Week in Startups on Apple: https://rb.gy/v19fcp*Follow Lon:X: https://x.com/lons*Follow Alex:X: https://x.com/alexLinkedIn: https://www.linkedin.com/in/alexwilhelm/*Follow Jason:X: https://twitter.com/JasonLinkedIn: https://www.linkedin.com/in/jasoncalacanis/*Thank you to our partners:(9:48) Crusoe Cloud: Crusoe is the AI factory company. Reliable infrastructure and expert support. Visit https://crusoe.ai/build to reserve your capacity for the latest GPUs today.(20:20) Uber AI Solutions: Your trusted partner to get AI to work in the real world. Book a demo with them TODAY at http://uber.com/twist(30:11) Every.io - For all of your incorporation, banking, payroll, benefits, accounting, taxes or other back-office administration needs, visit http://every.io/

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Why “context graphs” have suddenly become one of the most important ideas in enterprise AI, and what they reveal about why agents fail or succeed at real work. This episode explains the core idea behind context graphs, how they differ from systems of record and knowledge graphs, and why capturing decision traces — the why, not just the what — may be the key to scalable autonomy inside organizations. In the headlines: AI wearables make another run at relevance, China reports early success using AI for cancer detection, X faces global backlash over Grok moderation failures, and Yann LeCun publicly breaks with Meta's AI strategy. Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Zencoder - From vibe coding to AI-first engineering - ⁠http://zencoder.ai/zenflow⁠Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai