POPULARITY
Categories
La gentiane jaune (Gentiana lutea) est une plante emblématique des pâturages boisés jurassiens. Connue pour ses vertus digestives et médicinales, elle est aussi au cœur d'une balade proposée par le réseau vaudruzien des "Chemins chouettes", un bouquet de balades écotouristiques proposées par l'association neuchâteloise Espace Val-de-Ruz. Un itinéraire qui part de la Vue-des-Alpes jusqu'à l'espace forestier du Repère, pour découvrir cette reine des hauteurs de nos contrées. Notre guide est biologiste à la retraite et s'appelle Frédéric Cuche.
durée : 00:03:56 - Le maire de Laàs depuis 1983, Jacques Pédehontaà, a été entendu sous le régime de la garde à vue mardi 9 juin, après une plainte déposée par une association anti-corruption. À Laàs, les habitants rencontrés s'y attendaient. Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
durée : 00:02:57 Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
durée : 00:53:19 - Les informés de franceinfo - Tous les soirs, les informés de franceinfo débattent de l'actualité autour de Victor Matet. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
durée : 00:01:45 Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,
90s Vs 2000s TONIGHT inside The Vue w/Willie Chin!
Here is Debbie in her own words.With over 15 years experience in Frontend development I have worked as a Tech Lead and consultant for many important clients with various technologies and often with a strong focus on performance. I have lead teams both in house and remotely as well as giving workshops and training. I have many years of experience as a mentor for online learning platforms, Treehouse and OpenClassrooms and am a teacher at Vue School as well as Jamstack Explorers, and I am a writer for Ultimate Courses.I am a Platform Engineer – Applied AI at Zephyr Cloud, Google Developer Expert in web technologies, Nuxt Ambassador, and am a former Microsoft Most Valuable Professional in developer technologies, Media Developer Expert and GitHub Star Alumni.I have a special love for JavaScript frameworks especially Vue.js and Nuxt.js and am now focused on testing especially end to end testing with Playwright. I have a Frontend and FullStack Tech Degree and am Microsoft certified. I am an international speaker, and have spoken at many meet-ups and conferences worldwide on many continents including Antarctica.I am Irish but live in Mallorca, Spain and when I am not writing code and studying new technologies you can find me doing all sorts of sports from running, cycling and skiing, body combat and of course Taekwondo as I am a 4th degree black belt.You can find Debbie on the following sites:BlueskyBlogLinkedInGitHubYouTubeXPLEASE SUBSCRIBE TO THE PODCASTSpotifyApple PodcastsYouTube MusicAmazon MusicRSS FeedYou can check out more episodes of Coffee and Open Source on https://www.coffeeandopensource.comCoffee and Open Source is hosted by Isaac Levin
durée : 00:08:32 - Les journaux de France Culture - L'usage de l'eau, la question des pesticides, celle de l'élevage, le revenu des agriculteurs, autant d'enjeux qui risquent d'envenimer les débats lors de l'examen de la loi urgence pour la protection et la souveraineté agricoles. "Un texte de réconciliation", selon le gouvernement. - réalisation : Mathieu Laurent, Annie Brault, Martin Desclozeaux, Caroline Bennetot - invités : Pierre-Marie Aubert Agronome, sociologue et chercheur à l'IDDRI, Institut du développement durable et des relations internationales. Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
durée : 00:03:47 - Le Regard culturel - par : Lucile Commeaux - Cannes : à propos de rochers, d'autofiction, de Scarlett Johansson et de nostalgie
durée : 00:03:47 - Les Matins de France Culture - par : Lucile Commeaux - Cannes : à propos de rochers, d'autofiction, de Scarlett Johansson et de nostalgie Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
durée : 00:03:47 - Les émissions culturelles de France Culture - par : Lucile Commeaux - Cannes : à propos de rochers, d'autofiction, de Scarlett Johansson et de nostalgie Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
Catch up on the key moments and presentations from FOCUS. Recorded on May 4, 2026. At Fidelity, our mission is to build a better future for Canadian investors and help them stay ahead. We offer investors and institutions a range of innovative and trusted investment portfolios to help them reach their financial and life goals. Fidelity mutual funds and ETFs are available by working with a financial advisor or through an online brokerage account. Visit fidelity.ca/howtobuy for more information. For a fifth year in a row, FidelityConnects by Fidelity Investments Canada was ranked #1 podcast by Canadian financial advisors in the 2025 Environics' Advisor Digital Experience Study. -- Vue d'ensemble : les moteurs actuels des marchés Découvrez les moments forts et les présentations de FOCUS. Date : 4 mai 2026 Chez Fidelity, notre mission consiste à aider le public investisseur canadien à se bâtir un meilleur avenir et à rester à l'avant-garde. Nous offrons aux particuliers et aux institutions une gamme de portefeuilles de placement innovants et fiables pour les aider à atteindre leurs objectifs financiers et personnels. Les fonds communs de placement et les FNB de Fidelity sont offerts par l'intermédiaire des conseillers et conseillères en placements et de comptes de courtage en ligne. Pour de plus amples renseignements, visitez fidelity.ca/commentinvestir. Les baladodiffusions DialoguesFidelity se sont classées au premier rang pour une cinquième année consécutive lors du sondage 2025 d'Environics sur l'expérience numérique des conseillers et conseillères en placements au Canada.
durée : 00:02:14 - France Inter sur le terrain - Donald Trump doit arriver mercredi à Pékin, première visite d'un président américain en Chine depuis 2017. Alors que les relations entre les deux pays sont tendues, le commerce chinois espère profiter de cette visite présidentielle pour trouver un accord sur les droits de douane. - réalisation : Sébastien Berriot Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
ESSENTIEL, le rendez-vous culture présenté par Sandrine Sebbane. Elle reçoit Jessica Nelson pour son roman « Les amants du Vercors » aux éditions Albin Michel et Claude Tovel Posternak pour son premier roman « La promesse de Leiba » chez Robert Laffont À propos du livre : « Les amants du Vercors » paru aux éditions Albin Michel Marc, Marie et Louis ont partagé leur enfance dans les montagnes du Vercors. Le petit berger passionné par les étoiles, le fou de spéléologie et la future guérisseuse forment un trio lumineux. On se jure fidélité, on éprouve les premières émotions... Puis chacun se sépare. Jusqu'à ce printemps 1943. Le Vercors est alors une terre de refuge pour les juifs, les réfractaires du STO et les rebelles de tous horizons. Marc a pour mission de les accueillir ; Marie les soigne ; Louis devient chef de maquis. Unis dans une lutte inégale face à l'ennemi, ils devront aussi affronter leurs secrets et leurs sentiments. Le roman de Jessica L. Nelson raconte la folie et la cruauté des hommes, mais aussi l'amitié, le désir, le courage et le sacrifice. Vibrante évocation de l'un des plus hauts lieux de la Résistance française, Les Amants du Vercors est un hommage à ceux qui ont payé de leur vie et de leur jeunesse le prix de la liberté. Jessica Nelson a été conseillère littéraire pour l'émission Vol de nuit, rédactrice en chef de Au Field de la nuit et chroniqueuse dans Au fil des mots. Elle est aujourd'hui critique littéraire à Point de Vue. Elle est aussi cofondatrice du Prix de la Closerie des Lilas et des éditions des Saints Pères. Brillant comme une larme est son quatrième roman. À propos du livre : « La promesse de Leiba » paru aux éditions Robert Laffont Un premier roman remarquable, sur une histoire d'amour merveilleuse qui lie deux enfants, jusqu'à leur dernier souffle. Une plongée dans la culture juive ashkénaze, de la Russie jusqu'aux États-Unis, de la fin du XIXe siècle jusqu'au milieu du XXe siècle, portée par la promesse d'un personnage historique dont l'enfance n'avait jamais été racontée. Il n'y a pas de jour où il ne la cherche. Il en est persuadé, la Providence va la mettre sur sa route. Combien de fois a-t-il cru au miracle ? Combien de silhouettes blondes a-t-il suivies pour découvrir, quelques pas plus tard, un visage tristement inconnu ? Odessa est si vaste. Et le hasard s'entête à ne rien lâcher. Fils de paysan, Leiba est un enfant extraordinairement brillant. Un matin de neige, au coeur de la colonie juive agricole de Gromokleï, un être pas comme les autres l'attend, Chana. Un amour naît, entre celle qui a échappé à un pogrom et celui qui souffre de ne pas être aimé par sa mère. Pourtant, la marche de l'Histoire, dont Leiba s'apprête à devenir un acteur majeur, risque de les séparer. Dans ce roman, l'auteur nous emporte, entre xixe et xxe siècles, des rues d'Odessa et des champs de blé de la Russie jusqu'aux mines de fer du Minnesota. Il ressuscite la culture ashkénaze dont la richesse émerveille, alors que le progrès affleure et la révolution gronde. Claude Tovel Posternak a eu mille vies : commerçant forain, enseignant, publicitaire, vigneron, chroniqueur et parolier. Auteur de plusieurs essais, La Promesse de Leiba est son premier roman.
durée : 00:38:19 - Le 18/20 : un jour dans le monde - par : Fabienne Sintes - Douglas Kennedy regarde aujourd'hui son pays avec inquiétude. Entre retour de Trump, polarisation extrême ou durcissement du débat public, il décrit une Amérique traversée par un malaise profond. - réalisation : Philippe Lefébure, Nathalie Poitevin, Thomas Lenglain, Mathias Dubois - invités : Douglas Kennedy Ecrivain Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
durée : 00:38:19 - InterNational - par : Fabienne Sintes - Douglas Kennedy regarde aujourd'hui son pays avec inquiétude. Entre retour de Trump, polarisation extrême ou durcissement du débat public, il décrit une Amérique traversée par un malaise profond. - réalisation : Philippe Lefébure, Nathalie Poitevin, Thomas Lenglain, Mathias Dubois - invités : Douglas Kennedy Ecrivain Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
Redevenue un point névralgique de l'Alliance atlantique, la base aérienne de Keflavik accueille des rotations régulières d'avions alliés chargés de surveiller l'espace aérien islandais. Au cœur de l'Atlantique nord, l'Otan entend afficher sa présence face à la Russie. Sur le tarmac de la base aérienne islandaise de Keflavik, le lieutenant-colonel Arvidsson désigne un avion de chasse stationné dans un hangar. Le JAS 39 Gripen, fleuron de l'industrie de défense suédoise, fait partie des six avions déployés en février par Stockholm dans le cadre d'une mission de police du ciel de l'Otan. « C'est un avion multirôle. Vous avez là ce qu'on appelle un ''pod de désignation'' qui permet d'identifier des cibles : un autre avion ou des drones. On les détecte au radar, et si besoin, on les verrouille, détaille le militaire. L'avion est aussi équipé d'un canon de 27 millimètres intégré dans le fuselage. L'idée ici, c'est d'avoir une capacité d'autodéfense. » Une base redevenue stratégique depuis le départ des Américains Construite pendant la Seconde Guerre mondiale, la base de Keflavik a longtemps accueilli des forces américaines. Mais en 2006, Washington retire officiellement ses soldats d'Islande, laissant ce petit pays de 400 000 habitants – sans armée permanente – dépendre entièrement de ses alliés. Depuis, les États membres de l'Otan assurent par rotation la surveillance du ciel islandais. La Suède, devenue officiellement membre de l'Alliance en 2024, vient d'y effectuer pour la première fois une mission autonome. « Vingt-quatre heures sur vingt-quatre, sept jours sur sept, deux appareils sont prêts à décoller, précise Robin Arvidsson. Si on détecte quoi que ce soit qui s'approche de l'Islande, comme des drones, on peut intercepter la menace. Jusqu'ici, nous n'avons pas eu à intervenir, mais on se tient prêts. » Vue de l'extérieur, Keflavik semble isolée, perdue dans des plaines balayées par les vents. Pourtant, la base occupe une position clé sur le couloir « GIUK » – une ligne imaginaire qui relie le Groenland, l'Islande et le Royaume-Uni. Contrôler ce couloir, c'est contrôler le passage entre l'Arctique et l'Atlantique Nord. Au cœur des rivalités dans l'Arctique « L'Islande n'a pas de forces armées, mais elle dispose d'une base parfaitement opérationnelle au cœur de la région arctique, résume le lieutenant-colonel Johan Legardt, commandant du détachement suédois. C'est la base la plus centrale de toute la région arctique, accessible à tous les alliés. » L'intérêt stratégique de l'Arctique s'est encore renforcé ces derniers mois. Les vues de Donald Trump sur le Groenland en début d'année ont replacé la région au centre des rivalités géopolitiques. Ressources minières, nouvelles routes maritimes rendues accessibles par la fonte des glaces, compétition militaire : les grandes puissances s'y repositionnent progressivement. « Notre présence est essentielle , insiste Johan Legardt. L'idée, c'est de montrer que si des bombardiers russes traversent cette zone, ils ne seront pas laissés sans surveillance. Notre présence sert à rappeler que l'Otan est là. » « Les Russes se déploient périodiquement dans la région, confirme Erlingur Erlingsson, chercheur à l'Institut des affaires internationales de l'université d'Islande. On a, en effet, pu observer des bombardiers, et il est certain que des sous-marins russes circulent dans la zone. Mais la présence alliée relève autant de la surveillance que de la dissuasion. » L'Islande, verrou de l'Atlantique nord Sans armée nationale, l'Islande reste entièrement dépendante de ses alliés pour sa sécurité. Jonas Allanson, chef d'état-major islandais, rappelle pourquoi l'île demeure indispensable au dispositif occidental : « La position clé de l'Islande, pour la sécurité de l'Amérique du Nord et de l'Europe, tient à une chose : ici, on peut surveiller tout le trafic maritime. » « On a déjà vu des accidents, ou en mer Baltique, des actes de sabotage sur des câbles sous-marins, indispensables pour nos communications, poursuit le responsable islandais. C'est pourquoi les Alliés travaillent ensemble : il faut surveiller ce fameux couloir stratégique et assurer la sécurité de l'Amérique du Nord et de l'Europe. » C'est pour renforcer ce dispositif que l'Otan a lancé en février la mission Arctic Sentry. Objectif : augmenter la fréquence des rotations aériennes alliées à Keflavik et maintenir une présence plus continue sur cette base devenue, à nouveau, l'un des principaux verrous stratégiques de l'Atlantique nord.
durée : 00:37:40 - L'Invité(e) des Matins - par : Guillaume Erner, Yoann Duval - La dette explose, la croissance est révisée à la baisse, le ministre de l'Économie avoue naviguer "à vue". Ex-membre du gouvernement Bayrou, en poste à Bercy de décembre 2024 à octobre 2025, Éric Lombard décrypte une équation française devenue impossible. - réalisation : Félicie Faugère - invités : Éric Lombard Homme politique français
durée : 00:14:50 - Journal de 8 h - Un accord de paix avec l'Iran est très proche. Déclaration de Donald Trump alors que Téhéran menace de refermer le détroit d'Ormuz si le blocus américain des ports iraniens se poursuit.
durée : 00:14:50 - Journal de 8 h - Un accord de paix avec l'Iran est très proche. Déclaration de Donald Trump alors que Téhéran menace de refermer le détroit d'Ormuz si le blocus américain des ports iraniens se poursuit.
durée : 00:14:50 - Journal de 8 h - Un accord de paix avec l'Iran est très proche. Déclaration de Donald Trump alors que Téhéran menace de refermer le détroit d'Ormuz si le blocus américain des ports iraniens se poursuit.
durée : 00:20:10 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - En 1980, le sociologue Henri Mendras, spécialiste de la ruralité, venait de publier un récit de sociologie imaginaire intitulé "Voyage au pays de l'utopie rustique". Marie-Hélène Baconnat l'interrogeait sur ce conte comme modèle pour l'avenir de la société française et ses aspirations écologiques. - réalisation : Rafik Zénine, Vincent Abouchar, Emily Vallat
durée : 00:20:07 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - En 1980, au micro de Marie-Hélène Baconnet, le sociologue de la ruralité Henri Mendras expliquait comment et pourquoi il avait mis en place le programme "Observation continue du changement social et culturel ". Il s'agissait d'analyser les évolutions sociales par le prisme de l'échelle locale. - réalisation : Rafik Zénine, Vincent Abouchar, Emily Vallat - invités : Henri Mendras Sociologue français
durée : 00:20:15 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - En 1980, Marie-Hélène Baconnet interrogeait le sociologue Henri Mendras à propos de la parution de "La sagesse et le désordre", ouvrage collectif qui analysait l'évolution des structures sociales de la France, à l'aube des années 1980. - réalisation : Rafik Zénine, Vincent Abouchar, Emily Vallat
durée : 00:19:04 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - En 1980, le sociologue Henri Mendras s'entretenait avec Marie-Hélène Baconnet sur l'évolution de l'agriculture française depuis les années 60. L'auteur de "La Fin des paysans" détaillait sa méthodologie et réaffirmait que cette civilisation millénaire était remplacée par des agriculteurs-producteurs - réalisation : Rafik Zénine, Vincent Abouchar, Emily Vallat - invités : Henri Mendras Sociologue français
Les nébuleuses de vent de pulsar sont des bulles de particules relativistes, alimentées par la perte d'énergie rotationnelle des pulsars. L'observatoire LHAASO (Large High Altitude Air Shower Observatory) a récemment permis de découvrir que la nébuleuse du Crabe, alimentée par le pulsar le plus énergétique de la Voie lactée, est un objet émetteur de rayons gamma de l'ordre du PeV (1015 eV), confirmant son rôle d'accélérateur de particules extrême. Les astrophysiciens de la collaboration LHAASO présentent aujourd'hui une autre source gamma ponctuelle d'ultra-haute énergie (E>100 TeV) qui est très clairement associée à la nébuleuse de vent de pulsar alimentée par PSR J1849-0001, un pulsar dont la puissance de ralentissement est 50 fois inférieure à celle du pulsar du Crabe. Ils publient leur étude dans Nature Astronomy. Source An extreme particle accelerator powered by pulsar PSR J1849−0001The LHAASO CollaborationNature Astronomy (13 avril 2026)https://doi.org/10.1038/s41550-026-02839-0 Illustrations PSR J1849 détecté pat LHAASO dans plusieurs bandes énergétiques (LHAASO collaboration) Vue aérienne de l'observatoire LHAASO et ses centaines de détecteurs de gerbes de particules induites par les photons gamma de haute énergie (LHAASO collaboration)
durée : 00:20:41 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - En 1980, Henri Mendras, sociologue connu pour son livre "La fin des paysans" paru en 1967, retraçait au micro de Marie-Hélène Baconnet l'histoire du monde agricole français. Du la paysannerie du XVIIIe siècle à la révolution verte des années 1950, il analysait la ruralité et son déclin. - réalisation : Rafik Zénine, Vincent Abouchar, Emily Vallat - invités : Henri Mendras Sociologue français
durée : 00:14:13 - L'invité d'un jour dans le monde - Alors que Donald Trump évoque des discussions “très productives” avec Téhéran, l'Iran, qui assume l'escalade militaire, conteste dans la foulée ses déclarations. Entre annonces contradictoires et poursuite des combats, la situation reste floue. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
On this episode, Andrew's buried in messy authentication work spread across legacy code, Chris recounts a frustrating GitHub Actions debugging session, and David explains the mental drain of working across both Vue 2 and Vue 3 in the same application. They talk about using workflow run triggers, scheduled builds, and GitHub's new Agentic Copilot workflows such as CI Doctor, Automatic Code Simplifier, and issue/PR management, while lamenting low-quality AI-generated PRs and paid AI code review tools. Andrew makes a special announcement about Blastoff Rails, they compare LazyVim, lazy.nvim, and Kickstart Neovim, we hear about Ruby 3.4.9 and its bug-fix release, and Marco Roth's Herb improvements for ERB tooling. Hit download now to hear more! LinksJudoscale- Remote Ruby listener giftUpload-artifact v7.0.0 (GitHub)Download-artifact v8.0.0 (GitHub)GitHub Agentic WorkflowsBringing Code Review to Claude CodeScott's Pizza ToursBlastoff Rails-June 11-12, 2026, Albuquerque, New MexicoLearn Enough Bridgetown to be Dangerous (Andrew's talk)lazy.nvimLazyVimkickstart.nvimkickstart-modular.nvimTree-sitterHerbMarco Roth X (Herb)HoneybadgerHoneybadger is an application health monitoring tool built by developers for developers.JudoscaleMake your deployments bulletproof with autoscaling that just works.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.Chris Oliver X/TwitterAndrew Mason X/TwitterJason Charnes X/Twitter
Gros zoom sur les skills et leurs usages dans les coding agents, sur les benchmarks de stacks techniques MCP, mais aussi du Java 26-27, du HttpClient, du NodeJS, des scenarios nucléaires pilotés par l'IA, de la méthodologie, bref on ne s'ennuie pas ! Enregistré le 15 mars 2026 Téléchargement de l'épisode LesCastCodeurs-Episode-338.mp3 ou en vidéo sur YouTube. News Langages Bruno Borges a créé un site, inspiré d'un site récent qui montrait comment CSS avait évolué, qui illustre justement comment Java a bien évolué au fil du temps, et est devenu un langage encore plus élégant https://javaevolved.github.io/ Code simplifié: main() allégé, var, blocs de texte, API String enrichie. Pattern Matching: switch sur types, instanceof amélioré, record patterns. Données: Records, collections immuables faciles à créer, méthodes de listes. Concurrence: Threads virtuels, CompletableFuture, StructuredTaskScope, ScopedValue. Erreurs & Sécurité: NPE précis, catch multiples, Optional amélioré, filtres de désérialisation. I/O & Réseau: HttpClient moderne, E/S fichiers/console simplifiées, transferTo. Dates & Heures: API modernisée, précise, immutables et thread-safe. Langage: Interfaces sealed/private, import de modules, Math.clamp Streams: Nouveaux opérateurs (takeWhile, mapMulti, Gatherers, teeing). Outils & Perf: jshell, exécution simplifiée, jwebserver, AOT, JFR, optimisation mémoire. 10+ raisons de ne pas utiliser le HttpClient du JDK, avec un article très détaillé de Brice Dutheil https://blog.arkey.fr/2026/02/08/ten-reasons-to-not-use-jdk-httpclient/ JDK HttpClient: intégré, non-upgradable. OkHttp: plus lourd (dépendance Kotlin). TLS/SSL: JDK: SSLContext limité, vérif hôte globale, épinglage manuel, SSLParameters rigides. OkHttp: contrôle fin (SSLSocketFactory/TrustManager), vérif hôte/épinglage dédiés, ConnectionSpec structuré. Connexions: JDK: pas de repli, fabrique socket custom impossible (pas UDS/Named Pipes direct), pool limité (propriétés système, contrôle pauvre avant JDK 20/21). OkHttp: repli automatique, fabrique custom, pool granulaire. Réseau: JDK: résolveur DNS par défaut, Authenticator unique. OkHttp: résolveur DNS custom, authentificateurs séparés (proxy/serveur). Cycle Requêtes: JDK: pas d'intercepteurs ni API événements intégrés. OkHttp: addInterceptor, EventListener pour événements granulaires. Ressources: JDK: pas d'arrêt propre avant JDK 21. OkHttp: arrêt granulaire (pool, exécuteur, cache). Timeout: JDK: désactivé après en-têtes; le transfert du corps peut dépasser le timeout initial. JDK 26 et JDK 27 : ce qui nous attend — https://www.infoq.com/news/2026/02/java-26-so-far/ JDK 26 est une version non-LTS prévue le 17 mars 2026, avec 10 nouvelles fonctionnalités réparties en 5 catégories Le support HTTP/3 arrive enfin dans l'API HTTP Client standard de Java (JEP 517) La Structured Concurrency (projet Loom) en est à sa 6e preview, avec l'ajout d'une méthode onTimeout() sur StructuredTaskScope.Joiner Les Lazy Constants passent en 2e preview : des constantes initialisées à la demande, utiles pour optimiser le démarrage Le G1 GC gagne en performance via une réduction des synchronisations entre threads applicatifs et threads GC (JEP 522) Le cache d'objets AOT (JEP 516) est étendu pour fonctionner avec n'importe quel GC, y compris ZGC L'API Applet est définitivement supprimée (JEP 504), fermant une page historique de Java L'encodage PEM des objets cryptographiques continue sa preview avec support de chiffrement/déchiffrement de KeyPair Pour JDK 27 (septembre 2026), l'échange de clés post-quantique hybride pour TLS 1.3 est déjà ciblé (JEP 527) Project Valhalla progresse avec une preview des Value Classes : objets sans identité, à champs final uniquement Librairies Une étude de performance montre que Java est un super choix pour développer des serveurs MCP https://www.tmdevlab.com/mcp-server-performance-benchmark.html Comparaison de performances de serveurs MCP (Model Context Protocol) en Java, Go, Node.js, Python. Méthodologie: 3,9 millions requêtes, environnement Docker (1 cœur CPU, 1 Go RAM/serveur). Fiabilité: 0% d'erreurs pour toutes les implémentations. Tiers de performance: 1 (Haute): Go & Java (latence < 1ms, ~1600 requêtes/s). ▪︎ Go: Efficacité mémoire exceptionnelle (18 Mo vs 220 Mo pour Java). ▪︎ Java: Latence marginalement meilleure, mais 12x plus de mémoire. 2 (Moyenne): Node.js (latence ~10,7 ms, ~560 requêtes/s). Surcharge par instanciation. 3 (Faible): Python (latence ~26,5 ms, ~290 requêtes/s). Limité par GIL. Recommandations production: Go: Optimal forte charge, cloud-native, optimisation coûts. Java: Latence très basse critique, infrastructure Java existante. Node.js & Python: Adaptés charges modérées/faibles, développement/test. Node.js et Python peuvent être optimisés pour améliorer leurs performances en production. Et encore, en Java, le benchmark n'a pas utilisé GraalVM pour une compilation native, ce qui aurait donné des chiffres côté mémoire qui aurait concurrencé Go Qui a la meilleure perf entre Quarkus et Spring pour faire des serveurs MCP ? https://medium.com/@egekaraosmanoglu/spring-boot-vs-quarkus-which-java-runtime-wins-the-ai-mcp-tools-performance-battle-4da9d6a248d5 Quarkus JVM: Débit et latence les plus élevés (jusqu'à 16 381 req/s, 65% plus rapide que Spring Boot), surpasse Spring Boot même avec Apache Camel. Quarkus Native: Consommation mémoire la plus faible (118 MB), démarrage instantané, performance prédictible. Spring Boot MVC: Bonnes performances, écosystème mature, nécessite un "warm-up" important (jusqu'à 44% de gain). Spring Boot WebFlux: Légèrement meilleur débit et latence que MVC (~5%), mais plus de mémoire et complexité réactive. Coût architectural: MapStruct: Impact négligeable (< ±5%). Apache Camel: Réduction de débit de 8-21%, mais valeur ajoutée significative; Quarkus JVM + Camel reste > Spring Boot baseline. Protocole MCP: Sur Quarkus JVM (avec Camel), surpasse gRPC. Recommandations: Débit max: Quarkus JVM. Coût/Serverless: Quarkus Native. Intégration d'entreprise: Quarkus JVM + Camel + MapStruct. Meilleur choix Spring: Spring Boot WebFlux + MapStruct. Benchmark des stacks qui implémentent MCP https://www.tmdevlab.com/mcp-server-performance-benchmark-v2.html MCP (Model Context Protocol) est le protocole d'Anthropic pour connecter les LLMs à des outils et sources de données externes ; ce benchmark compare 15 implémentations serveur. 39,9 millions de requêtes traitées avec zéro erreur, sur des charges I/O réalistes (Redis + HTTP API) plutôt que des tâches CPU synthétiques. Rust atteint 4 845 RPS avec seulement 10,9 Mo de RAM ; Quarkus obtient 4 739 RPS avec la meilleure latence (4,04 ms en moyenne, 8,13 ms au P95). Go (3 616 RPS) et Spring MVC (3 540 RPS) constituent un second groupe solide. Node.js plafonne à 423 RPS ; Bun est 2,2x plus rapide sur un code identique (876 RPS) ; Python atteint 259 RPS avec 4 workers et uvloop. Découverte notable : un bug dans le SDK Rust rmcp v0.16 ajoutait ~40 ms de latence à toutes les réponses HTTP, limitant le débit à 1 283 RPS ; corrigé en v0.17 via la PR #683. Les images natives GraalVM réduisent la mémoire de 27 à 81 % mais dégradent le débit de 20 à 36 % ; Quarkus-native est l'exception avec 36 Mo RAM et 3 449 RPS. Spring MVC (bloquant) surpasse WebFlux (réactif) à 50 utilisateurs simultanés, rappelant que le modèle réactif n'est pas toujours gagnant. Recommandations : Rust ou Quarkus pour la production haute charge, Go pour le cloud-native, Bun plutôt que Node.js en JavaScript. Jakarta EE 12 Milestone 2 : données, cohérence et configuration https://www.infoq.com/articles/jakartaee-12-milestone-2/ Jakarta EE est la plateforme Java entreprise open-source, socle de frameworks comme Quarkus et Spring, qui standardise les APIs pour la persistance, les transactions, la sécurité, etc. Jakarta EE 12 adopte Java 21 comme baseline (avec support Java 25) et supprime définitivement le SecurityManager déprécié. La nouvelle spec Jakarta Query unifie JPQL (SQL/relationnel) et JDQL (NoSQL) en un seul langage avec deux profils : Core Language (portable) et Persistence Language (relationnel). Jakarta Data 1.1 introduit les requêtes dynamiques via une API fluente avec Restriction et l'annotation @Is pour des conditions plus expressives. Jakarta Data supporte désormais les repositories stateful, permettant la gestion du cycle de vie des entités (persist, merge, detach, refresh) comme en JPA classique. Jakarta NoSQL 1.1 intègre Jakarta Query via une nouvelle interface Query et supporte les projections avec des Java records. Jakarta Persistence 4.0 supporte SequencedCollection (Java 21) comme type de collection dans les entités. Une nouvelle spec Jakarta Agentic AI est en cours, visant des APIs vendor-neutral pour construire des agents IA sur les runtimes Jakarta EE, avec intégration prévue de LangChain4j et Spring AI. Cette release est encore un milestone (pas pour la prod) — l'adoption large dépendra de la maturité des outils (IDE, validation de requêtes, diagnostics). Nouveaux benchmarks Quarkus vs Spring Boot : performance complète et transparente https://quarkus.io/blog/new-benchmarks/ Quarkus est un framework Java optimisé pour les conteneurs, connu pour son faible usage mémoire et son démarrage rapide, concurrent principal de Spring Boot. Les anciens graphiques de performance sur quarkus.io étaient obsolètes, sans date, sans source, et ne montraient pas le débit (throughput). L'absence de données sur le throughput faisait croire à tort que Quarkus avait de mauvaises performances à ce niveau. Un nouveau benchmark open source a été créé, transparent et reproductible, disponible sur GitHub. Résultats : Quarkus gère 2,7x plus de transactions par seconde que Spring Boot, démarre 2,3x plus vite, avec deux fois moins de mémoire. Des experts Spring Boot externes ont contribué à rendre la comparaison plus équitable, notamment sur la configuration des pools de connexions. Les threads virtuels améliorent le débit d'environ 6000 tps supplémentaires pour tous les frameworks testés. Spring Boot 4 offre un meilleur débit que Spring Boot 3, mais au prix d'un démarrage plus lent et d'une empreinte mémoire plus élevée. En mode natif (GraalVM), le démarrage est ultra-rapide mais le throughput est divisé par deux, pour Quarkus comme pour Spring Boot. Le mode natif n'est recommandé que pour les applis démarrées/arrêtées très fréquemment ou à faible charge. Quarkus 3.32 : fondations pour la prochaine LTS https://quarkus.io/blog/quarkus-3-32-released/ Quarkus est un framework Java cloud-natif optimisé pour GraalVM et HotSpot, conçu pour les microservices et les environnements conteneurisés. Cette version marque le feature freeze pour la prochaine version LTS 3.33. Intégration de Project Leyden (AOT JVM) : le démarrage d'une application REST minimale passe de 370ms à 80ms. L'entraînement Leyden peut se déclencher au build ou via les tests d'intégration. Amélioration du graceful shutdown HTTP, avec des contributions de l'équipe Keycloak. Enregistrement automatique dans Consul via l'extension Stork pour la découverte de services. Nouvelles fonctionnalités de sécurité : DPoP nonce providers personnalisés, support de rich authorization pour OIDC. Possibilité de personnaliser l'ordre des mécanismes d'authentification et ajout de OIDCAuthenticationCompletionAction. Mise à jour du framework Google Cloud Functions en version 2.0, ainsi que Camel Quarkus et Quarkus CXF. Les utilisateurs sur LTS 3.27 sont encouragés à tester la migration vers 3.33 pour faire remonter des retours. NodeJS change sa cadence de releases https://nodejs.org/en/blog/announcements/evolving-the-nodejs-release-schedule Node.js est le runtime JavaScript côté serveur le plus utilisé, géré par la OpenJS Foundation avec un cycle de releases actif depuis la fusion avec io.js il y a dix ans. À partir de Node.js 27 (octobre 2026), le projet passe d'une release majeure tous les six mois à une seule par an. Chaque release deviendra LTS, supprimant la distinction entre versions paires (LTS) et impaires (non-LTS). Un nouveau canal Alpha est introduit, permettant les changements semver-major pendant la phase de test précoce. Les phases deviennent : Alpha (6 mois, oct. à mars), Current (6 mois, avr. à oct.), LTS (30 mois), puis EOL. La durée totale de support reste de 36 mois, identique au modèle actuel. La numérotation des versions s'aligne sur l'année calendaire de la release Current (ex : 27.0.0 en 2027). La version Alpha est signée, taguée et testée via CITGM, mais n'est pas destinée à la production. La motivation principale : les versions impaires étaient peu adoptées, la distinction pair/impair perturbait les débutants, et réduire les lignes de release parallèles allège la charge des bénévoles. Les auteurs de bibliothèques sont encouragés à intégrer les releases Alpha dans leur CI dès que possible pour détecter les régressions en amont. Web jQuery v4 est sorti https://www.infoq.com/news/2026/02/jquery-4-release/?utm_source=twitter&utm_medium=link&utm_campaign=calendar jQuery est une bibliothèque JavaScript historique qui simplifie la manipulation du DOM, la gestion des événements et les requêtes AJAX, encore très présente dans de nombreuses bases de code. Cette version majeure sort pour les 20 ans de la bibliothèque, après presque une décennie sans version majeure. Suppression du support d'Internet Explorer 10 et antérieur, Edge Legacy et les anciennes versions iOS/Android. IE11 reste encore supporté dans jQuery 4, mais sa suppression est prévue pour jQuery 5. Le code source migre d'AMD vers les ES modules, pour une meilleure compatibilité avec les outils de build modernes. Le bundler passe de RequireJS à Rollup. Suppression des fonctions dépréciées comme jQuery.isArray, jQuery.parseJSON et jQuery.trim, désormais disponibles nativement en JavaScript. Le fichier gzippé gagne plus de 3 000 octets ; le build slim descend à environ 19,5 ko. Ajout du support des Trusted Types pour faciliter la compatibilité avec les Content Security Policy strictes. jQuery reste pertinent pour la maintenance de bases de code existantes et les projets nécessitant une faible dépendance aux frameworks. La réactivité en frontend : concepts et approches https://www.sfeir.dev/front/quest-ce-que-la-reactivite-en-frontend/ Un article qui resume comment la reactivite est implementee en front web La réactivité en frontend désigne le mécanisme qui permet de mettre à jour automatiquement l'UI quand les données changent, sans manipulation directe du DOM. Sans réactivité, les développeurs doivent mettre à jour manuellement chaque élément de l'interface, ce qui est fastidieux et source d'erreurs. Le data binding unidirectionnel (React) distingue le flux de données des callbacks d'interaction utilisateur. Le data binding bidirectionnel (Angular) synchronise automatiquement données et UI dans les deux sens. Le Virtual DOM (React, Vue) compare une représentation en mémoire avec le DOM réel avant d'appliquer uniquement les changements nécessaires. Les observables via RxJS (Angular) permettent de gérer des flux de données asynchrones et des événements complexes. Les signaux (SolidJS, Angular récent, Svelte) offrent des mises à jour granulaires et de meilleures performances que les approches précédentes. Les signaux proposent une API plus simple que les observables tout en restant très performants. La réactivité abstrait la manipulation du DOM et permet aux développeurs de se concentrer sur l'état de l'application. Data et Intelligence Artificielle Gunnar Morling a annoncé la sortie de Hardwood, un nouveau parseur Java pour les fichiers Apache Parquet, grâce aux leçons apprises par le 1BRC challenge https://www.morling.dev/blog/hardwood-new-parser-for-apache-parquet/ Hardwood : Nouveau parseur Apache Parquet open-source (Java 21+). But : Dépasser parquet-java (dépendances lourdes, lecteur mono-threadé). Points clés : Dépendances minimes, pipeline de décodage multi-threadé. APIs : RowReader (ligne) et ColumnReader (colonne, haute perf.). Optimisations : Parallélisme pages, préchargement adaptatif, moins d'allocations. Développement : Assisté par IA (Claude Code), révision humaine. Futur : "Predicate push-down", compatibilité parquet-java, écriture, CLI, intégration Iceberg. Apicurio Registry passe AI-Native — https://www.apicur.io/blog/2026/02/05/apicurio-registry-ai-natural-evolution Apicurio Registry est un registre open-source de schemas (OpenAPI, AsyncAPI, Avro, Protobuf…) gérant versioning, validation et gouvernance des APIs. Le projet étend ses capacités pour devenir une plateforme native AI, en appliquant les mêmes principes de gouvernance aux agents IA. Support du protocole A2A (Agent-to-Agent) : les agents s'enregistrent via des "Agent Cards" et se découvrent mutuellement via des endpoints standardisés. Un serveur MCP intégré permet aux LLMs d'interagir directement avec le registre (découverte de schémas, validation, création). L'intégration avec Claude Desktop est déjà documentée, permettant de gérer les artefacts en langage naturel. Deux nouveaux types d'artefacts : PROMPT_TEMPLATE (templates de prompts versionnés avec variables) et MODEL_SCHEMA (validation des entrées/sorties des agents). Les SDKs Java (LangChain4j, Quarkus) et Python (LangChain, LlamaIndex) sont disponibles. Une démo multi-agents illustre le "context chaining" : chaque agent reçoit les sorties des agents précédents dans la pipeline. La roadmap prévoit : gestion du cycle de vie des agents, recherche sémantique, intégration dans les pipelines de déploiement. L'Histoire du Deep Learning : quand les machines ont commencé à apprendre https://blog.ippon.fr/2026/02/20/lhistoire-du-deep-learning-quand-les-machines-ont-commence-a-apprendre/ un article qui retrace les avancées clées du machine learning Le deep learning est un sous-domaine du ML basé sur des réseaux de neurones empilés en couches, aujourd'hui omniprésent dans la vision, le langage et la recommandation. Le Perceptron (1957) est le premier modèle formel d'apprentissage supervisé, mais il échoue sur des problèmes non linéaires comme le XOR : une limite structurelle, pas algorithmique. La rétropropagation du gradient (années 80) permet d'entraîner des réseaux multi-couches, mais souffre du problème de "vanishing gradient" qui bloque l'apprentissage en profondeur. L'essor du deep learning dans les années 2000 est autant une révolution matérielle qu'algorithmique : les GPU, conçus pour le jeu vidéo, se révèlent parfaitement adaptés aux calculs matriciels. AlexNet (2012) marque une rupture industrielle en démontrant qu'un CNN profond entraîné sur GPU surpasse largement les méthodes classiques en reconnaissance d'images. Les LSTM (1997) résolvent les problèmes de mémoire à long terme des RNN, mais leur nature séquentielle limite fortement la parallélisation. Les Transformers ("Attention Is All You Need", 2017) révolutionnent le domaine en remplaçant la récursion par un mécanisme d'attention parallélisable, adaptable aux GPU et TPU. L'IA générative introduit une rupture conceptuelle : les modèles apprennent la distribution des données pour en produire de nouveaux exemples, et non plus simplement classifier. Les LLM offrent un socle généraliste réutilisable pour de nombreuses tâches, là où l'IA prédictive nécessitait un modèle spécifique par problème. La question de l'AGI reste ouverte et très incertaine, mais l'IA devient déjà un "acteur logiciel" capable de raisonner et d'agir de manière autonome via les agents. Ca y est, Agent to Agent Protocol (A2A) est sorti en version 1.0 https://a2a-protocol.org/latest/announcing-1.0/ Prêt pour la prod Support multi-version ( multi-protocoles (gRPC, HTTP+JSON…) Multi-tenancy : un même endpoint peut supporter et exposer plusieurs agents distincts Agent Cards signées et vérifiables cryptographiquement pour vérifier l'identité des agents Flexibilité : les clients peuvent choisir de consommer les résultats par polling, streaming, ou également webhooks Outillage Le guide complet pour créer des skills pour vos agents, par Anthropic https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf Définition et structure : Les skills sont des dossiers contenant des instructions (fichier SKILL.md obligatoire) et des scripts qui enseignent aux agents comment exécuter des tâches spécifiques ou utiliser des outils MCP de manière fiable. Fonctionnement technique : Le système repose sur la "divulgation progressive" via un en-tête YAML critique, permettant à Claude de charger le contexte de la compétence uniquement lorsque la demande de l'utilisateur le nécessite. Cycle de vie : Le guide couvre toutes les étapes de développement, de la définition des cas d'usage (automatisation, création de documents) aux protocoles de test et de distribution. il couvre aussi comment tester (brievement) et des patterns communs Apprendre a utiliser les skills pour structurer son code ia https://philippart-s.github.io/blog/2026-02-18-anthropic-skills/ Les Skills Claude sont des packages d'instructions dans un dossier enseignant à Claude comment gérer des tâches spécifiques de façon cohérente. Un skill se compose au minimum d'un fichier SKILL.md avec un frontmatter YAML et des instructions en Markdown. Le frontmatter YAML impose deux champs obligatoires : name (en kebab-case) et description (max 1024 caractères expliquant quoi faire et quand le déclencher). Les skills fonctionnent de façon identique sur Claude.ai, Claude Code et l'API sans modification. Trois catégories principales : création de documents/assets, automatisation de workflows multi-étapes, et amélioration d'intégrations MCP. Les skills s'appuient sur le principe de divulgation progressive : frontmatter toujours chargé, corps du SKILL.md si pertinent, fichiers liés à la demande. Cinq patterns courants : orchestration séquentielle, coordination multi-MCP, raffinement itératif, sélection d'outils contextuelle, intelligence métier embarquée. Les tests doivent couvrir le déclenchement (90% des requêtes pertinentes), le fonctionnel et la comparaison avec la baseline sans skill. Pour la distribution, héberger sur GitHub avec un README séparé du dossier du skill (pas de README.md dans le dossier lui-même). Un skill-creator officiel permet de générer un premier SKILL.md en 15-30 minutes à partir d'une description en langage naturel. Les skills pour les agents, c'est une façon d'automatiser des tâches répétitives https://glaforge.dev/posts/2026/02/21/easily-build-a-local-mcp-server-in-java-with-a-skill-in-gemini-cli/ Construction facile de serveurs MCP Java locaux pour Gemini CLI et autres agents. Solution au code Java répétitif : JBang + LangChain4j + un "skill" utilisé par Gemini CLI. Idée clée : Une "skill" pour Gemini CLI automatise génération et installation des serveurs. La "skill" génère un fichier Java, le compile et l'enregistre dans les paramètres de Gemini CLI. Avantages : Élimine le boilerplate, enregistrement automatique, développement rapide. Conclusion : Les "skills" d'agent automatisent les tâches répétitives et systématisent l'expérimentation. Un SKILL.md par Julien Dubois pour permettre aux agents IA de créer des projets Spring en suivant les bonnes pratiques à la JHipster https://github.com/jdubois/dr-jskill/blob/main/SKILL.md Dr JSkill est une "Agent Skill" conçue pour aider les IA (GitHub Copilot CLI, Claude Code) à générer des applications Spring Boot 4.x selon les meilleures pratiques de Julien Dubois. Permet de créer des projets full-stack modernes utilisant Java 25, PostgreSQL et Docker, avec un choix de frameworks front-end (Vue.js par défaut, React, Angular ou Vanilla JS). Intègre des scripts Node.js multiplateformes pour automatiser la génération de projets via start.spring.io sans dépendances npm externes. Préconise des choix technologiques stricts : Maven uniquement, pas de Lombok, et utilisation de Hibernate ddl-auto pour la gestion du schéma (pas de Flyway/Liquibase). Supporte nativement la compilation GraalVM (images natives) pour des démarrages ultra-rapides (
durée : 02:29:37 - Les Matins - par : Guillaume Erner, Yoann Duval - Ce matin, sur France Culture, à 7h40 et à 8h20, Guillaume Erner reçoit le grand écrivain et journaliste Ahmet Altan pour son dernier roman, "Boléro" (Actes Sud), l'occasion de nous livrer son regard sur la Turquie d'Erdoğan. A 7h17, l'avocate Marie Dosé reviendra sur les crimes de guerre de Daesch. - réalisation : Félicie Faugère
durée : 00:39:42 - L'Invité(e) des Matins - par : Guillaume Erner, Yoann Duval - Ahmet Altan est journaliste mais il est aussi l'un des plus grands écrivains turcs contemporains. Après cinq ans dans une prison de haute sécurité, il vient de quitter la Turquie pour la première fois depuis 10 ans, et de publier deux romans, dont "Boléro", traduit en français chez Actes Sud. - réalisation : Félicie Faugère - invités : Ahmet Altan Ecrivain et journaliste turc
durée : 00:03:37 - Le Pourquoi du comment : philo - par : Frédéric Worms - Leibniz soulignait que chaque être perçoit le monde à sa manière. La conciliation dépasse le simple arbitrage : elle est essentielle à la vie démocratique et au vivre-ensemble. Elle cherche à synthétiser les contraires plutôt qu'à les nier. - réalisation : Luc-Jean Reynaud
durée : 00:19:56 - Journal de 12h30 - Nos envoyés spéciaux au Liban Martin Troadec et Etienne Monin ont cheminé des quartiers du sud de Beyrouth vers la frontière israélienne et la localité de Marjayoun. Une route jalonnée de destructions dues aux bombardements israéliens. - invités : Delphine Dutard Maitre de conférences en science politique à l'université Grenoble Alpes, chercheuse au laboratoire CESICE (Centre d'Etudes sur la Sécurité Internationale et les Coopérations Européennes)
durée : 00:19:56 - Journal de 12h30 - Nos envoyés spéciaux au Liban Martin Troadec et Etienne Monin ont cheminé des quartiers du sud de Beyrouth vers la frontière israélienne et la localité de Marjayoun. Une route jalonnée de destructions dues aux bombardements israéliens. - invités : Delphine Dutard Maitre de conférences en science politique à l'université Grenoble Alpes, chercheuse au laboratoire CESICE (Centre d'Etudes sur la Sécurité Internationale et les Coopérations Européennes)
HTML All The Things - Web Development, Web Design, Small Business
The web development industry has felt pretty turbulent lately - AI disruption, layoffs, hiring freezes, and endless doom-scrolling. So in this episode, we're flipping the script. There's actually some genuinely good news happening in web development right now. From developer job numbers quietly ticking back up, to Nvidia's internal AI experiment showing productivity gains without eliminating roles, to Interop 2026 launching with all major browser vendors aligned on compatibility - the industry may be stabilizing more than it seems. We also talk about how AI is making our jobs easier (yes, really), why frameworks like React, Vue, and Svelte have matured into stable foundations, and why the “AI bias” toward certain tools is starting to disappear. In this episode Matt and Mike cut through the noise and highlight what's actually going right in web development - and why this might be one of the best times to adapt rather than panic. Show Notes: https://www.htmlallthethings.com/podcast/some-good-news-for-web-developers Use our Scrimba affiliate link (https://scrimba.com/?via=htmlallthethings) for a 20% discount!! Full details in show notes.
Tout le monde connaît son nom : Emile Louis. Un homme aux deux visages : gentil chauffeur de bus la journée, criminel de l'autre. Il est accusé d'avoir fait disparaître, dans des circonstances terribles, sept jeunes femmes handicapées entre 1975 et 1979. Il faudra vingt ans pour que le scandale éclate, révélant au grand jour l'horreur de celui qu'on surnomme le "boucher de l'Yonne". Dans le dernier épisode, Caroline Nogueras recevra un acteur majeur de cette affaire, Maître Didier Seban, l'avocat des disparues de l'Yonne. Sept jeunes disparues Septembre 1995. Depuis quelques mois, Stéphane Munka travaille pour l'émission Perdu de Vue, lancée par Jacques Pradel au début des années 90. Jeune journaliste, sa mission est simple : enquêter pour réunir des gens sur le plateau dans des affaires de cold cases. Ce jour-là, Jacques Pradel entre dans son bureau. Il y a une nouvelle affaire à traiter. Un homme, Pierre Monnoir, a harcelé le standard pour que son histoire soit entendue. Un podcast Bababam Originals Ecriture : Capucine Lebot Voix : Caroline Nogueras Learn more about your ad choices. Visit megaphone.fm/adchoices
durée : 00:52:26 - Le Cours de l'histoire - par : Xavier Mauduit, Maïwenn Guiziou - Quels sont les arguments historiques avancés par le pouvoir politique russe sur les liens entre l'Ukraine et la Russie ? Aux côtés d'historiennes et historiens, décryptage des discours des dirigeants russes pour comprendre comment ils lisent l'histoire ukrainienne à l'aune de leur projet politique. - réalisation : Laurence Millet - invités : Anne de Tinguy Historienne et politologue française; Alexandra Goujon Maîtresse de conférences à l'Université de Bourgogne; François-Xavier Nérard Maître de conférences à l'Université Paris 1.
durée : 00:06:26 - Caroline au pays des 27 - par : Caroline Gillet - Aujourd'hui, c'est les 4 ans de la guerre en Ukraine. Le pays veut rentrer dans l'Union européenne, mais quand et comment? Il faut demander à Benoît Mesnard, il est sur ces dossiers là. Et c'est le deuxième épisode avec lui. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
Vous êtes au restaurant. Le sommelier dépose devant vous un verre de Bourgogne blanc. Vous portez le verre à votre nez et, là, au lieu du beurre frais et de la noisette que vous attendiez, vous sentez quelque chose de bizarre. Pas franchement dégoûtant, mais pas non plus ultra agréable. Une odeur qui oscille entre le carton mouillé, la serpillère oubliée et un chien sauvé des eaux. Que faire sans passer pour un insolent ?Dans ce nouvel épisode de Parlons Vin, la journaliste Alicia Dorey vous explique comment déterminer rapidement si un vin est bouchonné.Et n'oubliez pas : parlons peu mais Parlons Vin !Vous pouvez écouter cet épisode sur Figaro Radio, le site du Figaro et sur toutes les plateformes d'écoutes.Chronique et rédaction : Alicia DoreyMontage : Antoine Lion-RantyPrise de son : Louis ChabainProduction exécutive : Aude Sérès, rédactrice en chef, pôle audio Le FigaroCoordination de production : Salomé Boulet, pôle audio Le FigaroCommunication : Réseaux sociaux Le FigaroVisuel & habillage : Studio design Le FigaroHébergé par Ausha. Visitez ausha.co/politique-de-confidentialite pour plus d'informations.
Thinking about planning a Disney wedding but wondering if the Swan Reserve at Walt Disney World might be the better option? In this episode of the Disney Wedding Podcast, Carrie Hayward talks with Amy Gifford about her real wedding at Disney's Swan Reserve — including her ceremony in the Oasis Ballroom, reception in the Vue Ballroom, vendor choices, budget priorities, weather backup plan, and what it was really like having Minnie & Mickey at the reception. You'll hear firsthand insight into: ✨ Why they chose The Vue at The Swan Reserve instead of Disney Fairy Tale Weddings ✨ Wedding costs, value, and what's included ✨ Ceremony and reception timelines ✨ Food, florals, entertainment, and vendor selection ✨ Viewing Epcot fireworks during the reception ✨ Tips for couples worried a Swan & Dolphin wedding won't feel "Disney enough" Whether you're just starting your Disney wedding planning or comparing venues, this episode is packed with real-world advice and honest experiences from a recent couple.
durée : 00:02:31 - Regarde le monde - Vous avez peut-être vu ces nouvelles images de propagande, en décembre dernier. Ce défilé militaire à Pyongyang. Le retour au pays de soldats nord-coréens. Ils ont combattu, aux côtés de la Russie, sur le front ukrainien. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
durée : 00:02:31 - Regarde le monde - Vous avez peut-être vu ces nouvelles images de propagande, en décembre dernier. Ce défilé militaire à Pyongyang. Le retour au pays de soldats nord-coréens. Ils ont combattu, aux côtés de la Russie, sur le front ukrainien. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
Show DescriptionWe're passing over another milestone episode and answering your Q's with our A's while we do it: Dave goes 3D printing, should CSS be inside a web component, Chris is trying to build web component for popovers, why isn't Vue used or talked about more, finding bugs in blocks in the new CodePen, and we're grateful for 700 episodes. Listen on WebsiteWatch on YouTubeLinks Vue.js - The Progressive JavaScript Framework | Vue.js vuejs/petite-vue Syntax: Hacking Pizza Ordering For Fun And Profit - YouTube Theo - Twitch SponsorsAxe-ConAxe-con - the world's largest digital accessibility conference is from the makers of Axe-core and Axe DevTools Browser Extension. Taking place online on February 24-25. Registration is free and also gets you access to the on-demand recordings. Axe-con has a specific Development Track for dev content - some top speakers are Ire Aderinokun (front-end developer and Google developer expert), Jesse Beach (Software Engineering Manager at Meta), and other prominent folks from orgs like Coinbase, Zendesk, Red Hat, Atlassian, and more.
Your website might rank #1 on Google but be completely invisible to ChatGPT, Claude, and Perplexity. In this episode, let's break down why a huge chunk of the web is fundamentally broken for AI systems - not because of bad content, but because of technical decisions that made sense for humans but make sites invisible to the AI systems rapidly becoming the front door to the internet.Chapter Timestamps00:00:00 - Introduction: The new game your website is losing00:01:43 - The Scale of the Problem: AI crawler traffic explosion00:05:19 - The JavaScript Problem: Why AI crawlers can't see your content00:10:28 - The Bot Protection Paradox: Accidentally blocking AI00:14:40 - The Speed Requirement: Why 200ms matters00:17:46 - AI Agents Are Struggling Too: Browser agents and their limitations00:20:46 - How to Fix It: 6 things you need to do00:25:33 - Closing: The web is adapting againKey Statistics569 million GPTBot requests on Vercel's network in a single month370 million ClaudeBot requests in the same period305% growth in GPTBot traffic (May 2024 to May 2025)157,000% increase in PerplexityBot requests year-over-year33% of organic search activity now comes from AI agents~40% failure rate for the best AI browser agents on complex tasksThe 6 Things to FixImplement Server-Side Rendering (SSR) - If your site uses a JavaScript framework (React, Vue, Angular) with client-side rendering, switch to SSR or static site generation immediately. Use Next.js, Nuxt, or a pre-rendering service.Add Structured Data with JSON-LD - Expose key information in machine-readable format using schema.org markup. Microsoft confirmed Bing uses this to help Copilot understand content.Optimize for Speed - Target server response time under 200ms. First Contentful Paint under 1 second. Largest Contentful Paint under 2.5 seconds.Check Your Bot Protection Settings - Review Cloudflare, AWS WAF, or your CDN's bot management. Make a deliberate decision about GPTBot, ClaudeBot, and PerplexityBot access.Kill Infinite Scroll and Lazy Loading for Content - Use paginated URLs with standard HTML links. Ensure high-value content is in the initial HTML response.Keep Sitemaps Current - Maintain proper redirects, consistent URL patterns, and fix broken links.Tools MentionedGlimpse - Free tool to test how AI sees your website: glimpse.webperformancetools.comShow LinksSources Referenced in This EpisodeAI Crawler Statistics:Vercel Blog - The Rise of the AI CrawlerCloudflare 2025 Year in ReviewCloudflare - From Googlebot to GPTBotSearch Engine Land - AI Optimization GuideJavaScript Rendering:Prerender.io - Understanding Web CrawlersSearch Engine Journal - Enterprise SEO Trends 2026No Hacks is a podcast about web performance, technical SEO, and the agentic web. Hosted by Slobodan "Sani" Manic.
In this episode of PodRocket, Daniel Thompson--Yvetot joins us to break down what's new in Tauri 2.0 and how developers are using the Tauri framework to build desktop and mobile apps with Rust and JavaScript. We discuss how Tauri lets developers use frameworks like React, Vue, and Angular for the UI while handling heavy logic in Rust, resulting in smaller app binaries and better performance than Electron alternatives. The conversation covers Create Tauri App for faster onboarding, the new plugin system for controlling file system and OS access, and how Tauri improves app security by reducing attack surfaces. They also dive into mobile app development, differences between system WebViews, experiments with Chromium Embedded Framework, and why cross platform apps still need platform-specific thinking. Daniel also shares what's coming next for Tauri, including flexibility in webviews, accessibility tooling, compliance requirements in Europe, and the roadmap toward Tauri 3.0. Links Tauri: https://v2.tauri.app LinkedIn: https://www.linkedin.com/in/denjell We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Fill out our listener survey (https://t.co/oKVAEXipxu)! https://t.co/oKVAEXipxu Let us know by sending an email to our producer, Elizabeth, at elizabeth.becz@logrocket.com (mailto:elizabeth.becz@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Check out our newsletter (https://blog.logrocket.com/the-replay-newsletter/)! https://blog.logrocket.com/the-replay-newsletter/ Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understanding where your users are struggling by trying it for free at LogRocket.com. Try LogRocket for free today. (https://logrocket.com/signup/?pdr) Chapters Special Guest: Daniel Thompson-Yvetot.
durée : 00:14:49 - L'invité du 13/14 - Stéphane Lenormand, député de Saint-Pierre-et-Miquelon et Bernard Briand, Président du Conseil territorial de Saint-Pierre-et-Miquelon. Vous aimez ce podcast ? Pour écouter tous les autres épisodes sans limite, rendez-vous sur Radio France.
Dr. Shaurice Mullins, known as Dr. M, is a distinguished business strategist, entrepreneur, and thought leader with over 25 years of experience driving transformational change for individuals, organizations, and communities worldwide. As President and CEO of multiple successful enterprises including The Elite Group, Inc., Elite Disaster Consulting International, and Shaurice Mullins International, she has established herself as a premier voice in economic innovation, leadership development, and sustainable business growth.Through these ventures, she delivers actionable strategies that help clients unlock new levels of access, leadership, and financial freedom. Dr. M's hallmark is her unique ability to transform challenges into opportunities. By fusing cultural insights with cutting-edge solutions, she elevates individuals and communities to their fullest potential—cultivating a legacy of wealth, empowerment, and enduring success. Her vision resonates deeply with creatives, professionals, entrepreneurs, and industry leaders seeking strategic guidance to elevate performance, lead with purpose, and achieve generational impact.Her expertise has earned prestigious recognition including the Presidential Lifetime Achievement Award for volunteer service, JPMorgan Chase "Woman to Know in America" honor, "North Carolina Woman to Know" award, and induction into the exclusive BOW Collective. Dr. Mullins' insights have been featured across major media platforms including The VUE, Thrive, Millennium, Forbes, CBS, FOX, NBC, ABC, and prominently displayed in New York City's Times Square.What sets Dr. Mullins apart as a speaker is her ability to seamlessly integrate cultural intelligence with cutting-edge business strategy, delivering practical frameworks that audiences can immediately implement. As a board-certified Holistic Health Practitioner, she brings a unique whole-person approach to leadership and performance optimization that resonates with diverse audiences from Fortune 500 executives to emerging entrepreneurs.Dr. Mullins' presentations combine deep strategic insight with inspirational storytelling, leaving audiences equipped with actionable strategies for breakthrough performance. Her clients and audience consistently report measurable improvements in leadership effectiveness, team performance, and organizational growth following her engagements. Through her unwavering leadership, Dr. Mullins inspires established and emerging leaders to make a global impact while cultivating financial independence.Contact Details:Email: hello@shauricemullins.com Company: Shaurice Mullins InternationalWebsites: www.shauricemullins.com Social Media: LinkedIN - dr-shaurice-e-mullins-dr-m-49709b5aFacebook - @shaurice-mullinsInstagram - @shauricemullinsTiktok - @shauricemullinsX - @shauricemullins Remember to SUBSCRIBE so you don't miss "Information That You Can Use." Share Just Minding My Business with your family, friends, and colleagues. Engage with us by leaving a review or comment on my Google Business Page. https://g.page/r/CVKSq-IsFaY9EBM/review Your support keeps this podcast going and growing.Visit Just Minding My Business Media™ LLC at https://jmmbmediallc.com/ to learn how we can help you get more visibility on your products and services.
durée : 00:58:30 - Cultures Monde - par : Julie Gacon, Mélanie Chalandon - Dans un premier roman aux allures de conte, "Le Ministère des rêves" (Les Argonautes), Momtchil Milanov fait revivre à travers les yeux d'un enfant les événements politiques qui ont marqué l'Europe de l'Est à la fin des années 1980. - réalisation : Vivian Lecuivre - invités : Momtchil Milanov écrivain bulgare; Nadège Ragaru Politiste et historienne, directrice de recherches au CERI Sciences-Po