Podcasts about Subtitles

Textual representation of events and speech in motion imagery

  • 840PODCASTS
  • 1,474EPISODES
  • 41mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 8, 2026LATEST
Subtitles

POPULARITY

20192020202120222023202420252026

Categories



Best podcasts about Subtitles

Show all podcasts related to subtitles

Latest podcast episodes about Subtitles

Sermons from Myers Park Presbyterian Church
The Stories That Surround Us: Stars of Descendants

Sermons from Myers Park Presbyterian Church

Play Episode Listen Later Jun 8, 2026 25:43


Joe Clifford's sermon for Sunday, June 7, 2026, at Myers Park Presbyterian Church in Charlotte, NC. Subtitles/closed captions for this video are available by clicking the “CC” button on the video player. Full sermon manuscripts can be found at myersparkpres.org/manuscripts.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,

Sermons from Myers Park Presbyterian Church
The Stories that Surround Us: The Image of God

Sermons from Myers Park Presbyterian Church

Play Episode Listen Later Jun 1, 2026 26:01


Joe Clifford's sermon for Sunday, May 31, 2026, at Myers Park Presbyterian Church in Charlotte, NC. Subtitles/closed captions for this video are available by clicking the “CC” button on the video player. Full sermon manuscripts can be found at myersparkpres.org/manuscripts.

iOS Today (Video HI)
iOS 804: iMessage Apps - Navigating the Junk Drawer of iOS Features

iOS Today (Video HI)

Play Episode Listen Later May 28, 2026 38:53


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

iOS Today (MP3)
iOS 804: iMessage Apps - Navigating the Junk Drawer of iOS Features

iOS Today (MP3)

Play Episode Listen Later May 28, 2026 38:53


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

All TWiT.tv Shows (MP3)
iOS Today 804: iMessage Apps

All TWiT.tv Shows (MP3)

Play Episode Listen Later May 28, 2026 38:53 Transcription Available


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

iOS Today (Video)
iOS 804: iMessage Apps - Navigating the Junk Drawer of iOS Features

iOS Today (Video)

Play Episode Listen Later May 28, 2026 38:53


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

All TWiT.tv Shows (Video LO)
iOS Today 804: iMessage Apps

All TWiT.tv Shows (Video LO)

Play Episode Listen Later May 28, 2026 38:53 Transcription Available


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

Total Mikah (Video)
iOS Today 804: iMessage Apps

Total Mikah (Video)

Play Episode Listen Later May 28, 2026 38:53 Transcription Available


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

Total Mikah (Audio)
iOS Today 804: iMessage Apps

Total Mikah (Audio)

Play Episode Listen Later May 28, 2026 38:53 Transcription Available


Why do iMessage apps feel like a forgotten experiment, and what buried gems are still hiding behind that plus button? Dan Moren joins the show to unpack which built-in features now outshine their third-party predecessors and what that says about Apple's approach to messaging. iMessage apps' evolution, developer challenges, and user engagement declines Default iMessage features vs. true apps Sticker packs, sharing GIFs, and gameplay in Messages lose relevance Apple's default iMessage tools—photos, polls, cash, check in, send later Nuances of audio messages, dictation, and in-app differences Tips for faster photo sharing and rearranging iMessage features Real-world uses and practical limitations of iMessage's check-in feature Notable third-party iMessage app recommendations and personal favorites New: Apple's 2026 accessibility updates, including voice control, real-time captions, and AI-powered magnifier Picks of the Week: Indigo cross-network social app and Wipr 2 content filter Hosts: Mikah Sargent and Dan Moren Contact iOS Today at iOSToday@twit.tv. Download or subscribe to iOS Today at https://twit.tv/shows/ios-today Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.

Beers With Bands
Ep. 292 - Subtitles.: Subtitles. On.

Beers With Bands

Play Episode Listen Later May 22, 2026 85:00


On this episode I sit down with subtitles., an emo/alt rock band from Arizona/France. We talk about how the two of them came together before diving into their debut album "Subtitles. On.". Before we end we talk about the scene in France and the future of the band. Be sure to follow Subtitles. and check out "Subtitles. On."!!!This episode features the songs "Subtitles. On." and "Reasons To Stay" from the album Subtitles. On.You can find Subtitles. at the following links:Instagram: https://www.instagram.com/subtitles_officialYoutube: https://www.youtube.com/@subtitlesband/Website: https://subtitlesband.com_______________________________________You can find Beers With Bands here:Twitter: https://twitter.com/BeersWBandsPodInstagram: https://www.instagram.com/beerswithbandspod/Bandcamp: https://beerswithbands.bandcamp.comEverywhere else: https://linktr.ee/BeersWithBands

BS Reactor
263 - TMNT 3 (1993) PART1

BS Reactor

Play Episode Listen Later May 22, 2026 34:08


Greetings, listeners. I've recently installed a new Haiku DLC. To celebrate, today's briefing will be delivered in verse. For culture and what not. ahem! “Pizza through time folds... Four turtles punch history wrong... Subtitles fear us...” That means: welcome back to BS Reactor. We're discussing Teenage Mutant Ninja Turtles III. There will be bad historical decisions. Ancient scepter glows... Humans touch the cursed object... As they always do... Translation: yes, the plot happens because no one runs from the woo woo magical nightmare object. Spoilers drift like smoke... Profanity blooms at dusk... The crew has STRONG thoughts... "That one means spoilers and swearing. I assumed that was obvious, but apparently I am required to support all reading levels." Digital dream home... B S Reactor dot com waits... Archives are amaze balls... That means visit the website if you enjoy our nonsense. Janet writes haiku... Firmware now tragically deep... Let's fucking go... "Yeah! I nailed that." Oh and they for got to say but; on the recording is Evan, Isaac, and special guest Alexander.

TPF's Podcast
ShowBox - Watch Films and TV Episodes Online with Subtitles

TPF's Podcast

Play Episode Listen Later May 22, 2026 1:40


ShowBox presents a simple entertainment space for viewers who want movies and TV series without turning browsing into a long task. ShowBox focuses on quick discovery, easy category scanning, and a viewing flow that feels direct for users who already know what they want or only have a mood in mind. It highlights online movies, series choices, and subtitle or dubbing support in a way that feels approachable across devices.From the homepage to the title pages, ShowBox is built around reducing hesitation. Visitors can move through featured choices, genre browsing, and search-led discovery instead of opening many confusing tabs. The site positions itself as a useful starting point for people looking for fast access to entertainment, especially when they want a lighter path from interest to playback through https://showbox.com.cv/.The value of showbox.com.cv also comes from how it frames the watching process. A strong movie platform is not only about listing titles; it is about helping users compare options quickly, read short details, and decide whether a film or episode matches the moment. That matters for phone users, desktop viewers, and anyone switching between short sessions and longer movie nights.ShowBox works best as a convenient viewing hub for people who prefer clean navigation, flexible discovery, and a familiar structure they can return to. Its appeal comes from combining movie selection, TV series access, dub and sub support, readable browsing, and a session style that respects the viewer's time while keeping the experience simple.

Movies You Forgot You Forgot
137: Howl's Moving Castle, Subtitles vs Dubs, and Favourite Studio Ghiblis

Movies You Forgot You Forgot

Play Episode Listen Later May 13, 2026 80:12


It's the phenomenal 2004 Studio Ghibli, Hayao Miyazaki directed based on the book of the same name flick Howl's Moving Castle.How could Joe & Adam forget this? Doesn't matter; they're glad they did so they got to rewatch it and talk about: the choice between watching subtitled Ghiblis vs English dub versions, the joyful little side characters that invariably pop up in a Miyazaki flick and bindels, plus a sort of connected but not really side chat about The Last Samurai.Got a film you forgot you forgot? Join our growing Discord community and tell us all about it: https://discord.gg/b3CUUdPRf7Or send us an email at moviesyouforgotyouforgot@gmail.com with your thoughts, episode suggestions, or just some light praise.You can also follow Adam @errorofways on Letterboxd; he rates and reviews the films he watches. Also, be a pal: tell your chums, rate us, review us, shout our name into the void - whatever helps spread the word.

Weekly Spooky
Cutting Deep into Horror | One Cut of the Dead (2017): Japan's Genius Zombie Horror Comedy, Filmmaking Chaos & Cult Movie Heart

Weekly Spooky

Play Episode Listen Later May 1, 2026 101:48 Transcription Available


What starts as a scrappy Japanese zombie movie quickly becomes one of the most inventive, hilarious, and unexpectedly heartfelt horror-comedies of the last decade.This week on Cutting Deep into Horror, Henrique Couto & Rachael Redolfi dive into One Cut of the Dead (2017), Shinichiro Ueda's brilliant cult favorite about zombies, chaos, low-budget filmmaking, and the beautiful disaster of trying to make art when everything is falling apart.At first glance, One Cut of the Dead looks like a strange, messy, low-budget zombie movie. But stick with it, because this is a film that rewards patience in a huge way. What unfolds is a clever, funny, deeply affectionate tribute to filmmakers, actors, crew members, and anyone who has ever tried to pull off something impossible with no time, no money, and way too much pressure.Henrique and Rachael talk about what makes the movie's structure so special, how it balances horror and comedy, why the film hits especially hard for anyone who has worked behind the camera, and how its chaotic energy turns into something genuinely joyful.Inside this episode:Why One Cut of the Dead is best experienced knowing as little as possibleHow the movie transforms from zombie weirdness into filmmaking geniusThe joy, stress, and absurdity of independent film productionWhy the second half completely recontextualizes the firstHow Shinichiro Ueda turns a tiny movie into a massive crowd-pleaserWhy this modern Japanese horror-comedy became such a beloved cult classicIf you love zombie movies, Japanese horror, horror-comedy, cult films, or stories about the madness of making movies, this episode is for you.Watch the film on AMC+ 

All Def SquaddCAST
208: Partner W/ Crazy Ex vs Partner W/ Bad Child | SquADD Cast Versus | All Def

All Def SquaddCAST

Play Episode Listen Later Apr 20, 2026 73:11


Introducing the All Def SquADD Cast show “Versus". It's a podcast with the OG SquADD! Each week, the SquADD will debate topics and vote at the end to see what wins. Versus airs every Monday and you can download and listen wherever podcasts are found.Special GuestRoxxy HazeJordan Conley Kali ScottThis Week We DiscussPartner W/ Crazy Ex vs Partner W/ Bad ChildWatching with Subtitles vs. No Subtitles EverFlirtatious Partner vs Partner W/ No FriendsS/o To Our SponsorsFACTORHead to Factormeals.com/squadd50off and use code squadd50off to get 50 percent off and free daily greens per box, with new subscription only, while supplies last until 09/27/2026. (See website for more details).Blue ChewBlueChew.comRight now, when you buy two months of BlueChew Gold, you get the third for FREE with promo code SQUADD. That's promo code SQUADD. Visit BlueChew.com for more details and important safety information.

Best of Hawkeye in the Morning
What's Behind The Trend of Younger Adults & Teens Using Subtitles

Best of Hawkeye in the Morning

Play Episode Listen Later Apr 10, 2026 4:53


Support the show: http://www.newcountry963.com/hawkeyeinthemorningSee omnystudio.com/listener for privacy information.

Mac & Gaydos Show Audio
Hour 3: Do you use subtitles to watch your favorite shows?

Mac & Gaydos Show Audio

Play Episode Listen Later Apr 9, 2026 34:39


Bruce & Gaydos explain why there is an uptick in young people using subtitles to watch television shows.

Russian Radio Show
B1-B2 | Why You STILL Don't Understand Spoken Russian | Subtitles | Ep. №126 (PART 2)

Russian Radio Show

Play Episode Listen Later Apr 9, 2026 9:52


#russianlanguage​ #русскийязык

English with Thiago
Subtitles Are DESTROYING Your English Listening Skills

English with Thiago

Play Episode Listen Later Apr 8, 2026 15:51


⭐Get the B2 EDGE APP for structured practice: https://studio.com/thiago

Learning English For Work
Office English: Ideas

Learning English For Work

Play Episode Listen Later Apr 6, 2026 8:56


What's the best way to share idea at work? In this episode of Office English, Pippa and Phil talk about coming up with and developing ideas, and making sure they actually happen.Subtitles and transcript: https://www.bbc.co.uk/learningenglish/english/features/office-english/260406Subscribe to our free email newsletter: https://www.bbc.co.uk/send/u178220599More business English programmes: https://www.bbc.co.uk/learningenglish/english/business-english

Russian Radio Show
B1-B2 | Why You STILL Don't Understand Spoken Russian | Subtitles | Ep. №126 (PART 1)

Russian Radio Show

Play Episode Listen Later Apr 2, 2026 13:24


#russianlanguage​ #русскийязык

Learning English For Work
Office English: Describing your job

Learning English For Work

Play Episode Listen Later Mar 30, 2026 7:10


Not sure how to describe your job to your friends or even those at your workplace? In this episode of Office English, Pippa and Phil talk about explaining what you do to people in and outside of work.Subtitles and transcript: https://www.bbc.co.uk/learningenglish/english/features/office-english/260330Subscribe to our free email newsletter: https://www.bbc.co.uk/send/u178220599More business English programmes: https://www.bbc.co.uk/learningenglish/english/business-english

Learning English For Work
Office English: Socialising

Learning English For Work

Play Episode Listen Later Mar 23, 2026 7:31


Do you have friends at work? Or do you prefer to keep your work and personal lives separate? In this episode of Office English, Pippa and Phil talk about making friends and keeping things professional.Subtitles and transcript: https://www.bbc.co.uk/learningenglish/english/features/office-english/260323Subscribe to our free email newsletter: https://www.bbc.co.uk/send/u178220599More business English programmes: https://www.bbc.co.uk/learningenglish/english/business-english

Russian Radio Show
B1-B2 | Russian Phrases of Surprise: Learn to Sound Like a Native

Russian Radio Show

Play Episode Listen Later Mar 20, 2026 16:10


#russianlanguage​ #русскийязык

Hacker Public Radio
HPR4596: Adding voice-over audio track created using text to speech on the movie subtitles

Hacker Public Radio

Play Episode Listen Later Mar 16, 2026


This show has been flagged as Clean by the host. We'll explain why we're doing it, what it is, and cover some useful tools along the way. I've been watching movies recommended to me by my colleagues. As I work for a global company, the recommendations are often “Foreign Language”, which by definition is every movie to someone. It's often difficult to read the subtitles, or they are distracting from the acting. So I thought of converting the subtitles to speech for inclusion as an audio track, to produce a Voice Over or Lectoring audio track. Lectoring aka Voice Over Translations First used is soviet countries to read the news and propaganda from a lectors - the first podcasts ? In Polish, lektor is also used to mean “off-screen reader” or “voice-over artist”. A lektor is a (usually male) reader who provides the Polish voice-over on foreign-language programmes and films where the voice-over translation technique is used. This is the standard localization technique on Polish television and (as an option) on many DVDs; full dubbing is generally reserved for children's material. https://en.wikipedia.org/wiki/Lector#Television Example: Night of the Living Dead To give you an idea of what this sounds like I'm going to play you an example of the out of copyright movie, Night of the Living Dead . In the United States, Night of the Living Dead was mistakenly released into the public domain because the original distributor failed to replace the copyright notice when changing the film's name Original First the original sound track, then the same clip with the voice over track. Voice Over Proof of Concept As a native English speaker I find it difficult to follow those Voice Over tracks as I am trying to focus on the underlying audio. In discussions with Polish friends, it seems that this is not a problem when Polish is your native language. To put that to the test I wanted to try it out on a movie to see if that were indeed the case. I asked on Mastodon for a non English movie that was Creative Commons but did have English Subtitles, and HPR host Windigo had the answer. 2009 Nasty Old People is a 2009 Swedish film directed by Hanna Sköld, Tangram Film. It premiered on 10 October 2009 at Kontrapunkt in Malmö, and on file sharing site The Pirate Bay. The film is available as an authorized and legal download under the Creative Commons license CC BY-NC-SA. So my idea was to take each bit of subtitle text, convert it to audio, then have the generated audio play at the same time the subtitle appears on the screen. We use piper to process shows here on HPR, and we also generate srt, or SubRip subtitle files for each show. SRT or SubRip files are the easiest subtitle file to work with. From https://en.wikipedia.org/wiki/SubRip The SubRip file format is described on the Matroska multimedia container format website as “perhaps the most basic of all subtitle formats.” SubRip (SubRip Text) files are named with the extension .srt , and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000). The comma (,) is used for fractional separator . A numeric counter identifying each sequential subtitle The time that the subtitle should appear on the screen, followed by –> and the time it should disappear Subtitle text itself on one or more lines A blank line containing no text, indicating the end of this subtitle I downloaded the movie from the Internet Archive , and then used Piper voice to convert a minutes worth of subtitles. piper_voice: A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization. GPL-3.0 license Once I had the audio prepared for a sample of the subtitles, it was over to audacity to create a new subtitle audio track. Audacity is the world's most popular audio editing and recording app GPL v2 or later, Timing the segments would be a problem, if it were not for the fact that Audacity supports srt files as Labels. File > Import > Lables. Then select the srt file The subtitle track with the text of the audio will be displayed. I could then Import each Audio segment and line them up with the subtitle track for to get the correct timing. Each subtitles segment created a new separate audio file which I then exported. I then used Kdenlive to open the video and import the audio and subtitle tracks. Kdenlive: is the acronym for KDE Non-Linear Video Editor. It works on Linux, Windows, macOS, and BSD. GPL-3.0-or-later There is a good article on adding by Jean-Marc on How to Add Subtitles Easily in Kdenlive Project > Subtitles > Add Subtitle Track Select the Subtitle file Align the subtitle and audio track. After rendering the segment out I was satisfied that this was something worth doing. The script The script can be found on the episode page for this show on the HPR site, and I put it together as a proof of concept. It creates a new audio track for the subtitles, and merges this with the original sound track to create a new selectable sound track. It begins by creating a length of silent audio that is as long as up to the first subtitle time segment begin timestamp. The first subtitle segment is converted from text to speech using Piper voice That segment of audio is added to the initial silence track. We check the total length so far, and then see if there is supposed to be silence between the last and next subtitle segment begin timestamp. If there is, then a filler piece of silence is added until the next subtitle should appear. If not then the audio for both subtitles play immediately after one another. I was worried that the subtitle audio would then lag behind the on screen dialogue but it works surprisingly well. Even long series of dialogue sort themselves out after a bit. We do this over and over again for each subtitle, right up to the very end of the movie. This new subtitle to speech audio track is then merged back into the media file as a new audio track. 96 00:15:06,240 --> 00:15:10,640 It will be two years before it's this big 97 00:15:12,840 --> 00:15:17,840 But don't you bother. By then I'll be long gone 98 00:15:19,840 --> 00:15:22,400 It was just a question 99 00:15:22,880 --> 00:15:25,480 Porridge? Original First the original sound track, then the same clip with the voice over track. Voice Over Lessons learned Now that I have done this for a lot of movies, there a few tips for getting the best output. The creation of the audio track usually goes well, but you can run into issues with the merging of the new track back into the movie. Preparation The first thing you need is a subtitle file which will be the basis of the voice you will be listening to. It should be good quality so that it matches when the actors speak. It's important to clean up this before you use it, fixing spelling mistakes and removing html that will get rendered. Listening to three hours of “I L Zero ve y Zero u”, or “less than forward slash I, greater than”, or “L am from Lndia” can get a bit tedious. You should also try and get versions that translate the songs as well. Getting a SRT file from the media. As many Subtitles are taken from a DVDs they can often be poor Optical character recognition versions of the bitmap-based streams. So a picture of string “Hello World” rather than the letters. ffmpeg By far the easiest and best way to get the subtitles is to extract it from the movie itself, provided it's a separate track. ffmpeg is a complete, cross-platform solution to record, convert and stream audio and video. LGPL-2.1-or-later, GPL-2.0-or-later https://ffmpeg.org/ ffmpeg -y -hide_banner -loglevel error -txt_format text -i "${this_movie_file}" "${this_srt_file}" Getting a SRT file from the web. If that fails you can try to get the subtitle files from the Internet. https://www.opensubtitles.org Select your language with the highest subtitle rating. You can check the media using the mpv media player. mpv is a media player based on MPlayer and mplayer2. It supports a wide variety of video file formats, audio and video codecs, and subtitle types. GPLv2+, parts under LGPLv2.1+, some optional parts under GPLv3 https://mpv.io/manual/master/ Name the srt file with the same prefix as the movie and mpv will play it. You can also use the --sub-files= option as well. mpv "${this_movie_file}" --sub-files="${this_srt_file}" Scrub through the file to see if the timing is correct. The subtitles can be toggled using the j key. Fixing Timing issues It's very important to get the subtitles to align, otherwise the voices will be out of sync. When the subtitles don't match up, it's usually that they need to have the start offset corrected. ffsubsync will automatically try and adjust the offset of the first subtitle to the first use of speech in a movie. ffsubsync: Language-agnostic automatic synchronization of subtitles with video, so that subtitles are aligned to the correct starting point within the video. MIT license https://github.com/smacke/ffsubsync pip install ffsubsync ffs video.mp4 -i unsynchronized.srt -o synchronized.srt LosslessCut will allow you to quickly remove additional trailers, or ads, at the beginning, so that ffsubsync will have a better chance of working if they are trimmed away. LosslessCut: aims to be the ultimate cross platform FFmpeg GUI for extremely fast and lossless operations on video, audio, subtitle and other related media files. GPL-2.0 license https://github.com/mifi/lossless-cut If that fails to match up the subtitles, you can use mpv keyboard shortcuts , move to the first speech segment an then press the Ctrl+Shift+Left and Ctrl+Shift+Right to adjust subtitle delay so that the next or previous subtitle is displayed. It will also show a number giving the miliseconds the delay is, eg -148416 miliseconds or -148.416 seconds. You can use many tools to adjust the subtitles, and I tried out SRT Offset . srt-offset: A simple command-line tool to offset SRT subtitle files. This tool allows you to adjust the timing of subtitles in SRT files, which can be useful when subtitles are out of sync with the video. MIT license srt-offset -i input.srt -offset -148.416 -o output.srt Manually adding the new subtitle to speech audio track If that presents an issue then you can use avidemux to just add the new audio track. Avidemux: is a free video editor designed for simple cutting, filtering and encoding tasks. GPL V2 Open Avidemux, and select “File > Open”, to select the movie. Then go to “Audio > Select Track” Select the next unselected track and tick “Enabled”, “Add Audio Track” Then pick the new mixed track, in this example .~NastyOldPeople_mixed.mp3 Conclusion I now find it much easier to watch a movie with the voice over track. It gets to a point where I don't even notice it is there and just hear the actors speak in their own language, and I just know what they are saying. Links 2009 Nasty Old People A Spanish voice-over translation avidemux by Jean-Marc on How to Add Subtitles Easily in Kdenlive container format Decimal separator extension ffmpeg ffmpeg on wikipedia ffsubsync GPL-3.0 license GPL v2 or later Kdenlive LGPL-2.1 LosslessCut Matroska MIT license Movie on Archive.org mpv mpv keyboard shortcuts mpv wikipedia Nasty Old People from the Internet Archive Night of the Living Dead Noc żywych trupów | Film grozy | Polski lektor OpenSubtitles opensubtitles.org Optical character recognition Piper voice SRT Offset srt, or SubRip subtitle files SubRip Timecode Voice-over translation Whisper Provide feedback on this episode.

Pro Wrestling Sauce
What's In The Box? + AJ Lee Can't Hang + Provide Subtitles | 2/18/26

Pro Wrestling Sauce

Play Episode Listen Later Mar 15, 2026 87:17


Pro Wrestling Sauce  LIVE | 2/18/26LIVE!  EVERY WEDNESDAY AT 10:15 EST on SlipperHouseExtraExtra.comLISTEN TO FULL EPISODES ON APPLE | SPOTIFY | GOOGLE | ecthttps://podcasts.apple.com/us/podcast/pro-wrestling-sauce/id1551880252https://open.spotify.com/show/1lUL6Vk2TYSWTSj0iyv7Tl?si=jFQPsfJsT3W0_oB0gO5Q7w©️SlipperHouseExtraExtra2026

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 6 | How Do I Tell My Spouse That?

Transformation Church | Pensacola, FL

Play Episode Listen Later Mar 9, 2026 39:24


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Brad, and Ashley Livingston brings the message from the Sushi, Sex, and Subtitles series with How Do I Tell My Spouse That?

Russian Radio Show
B2-C1 | Kaliningrad vs. Sakhalin — Two Edges of Russia | Subtitles | Ep. №124

Russian Radio Show

Play Episode Listen Later Mar 7, 2026 19:51


#russianlanguage​ #русскийязык

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 5 | A Working Marriage

Transformation Church | Pensacola, FL

Play Episode Listen Later Mar 2, 2026 36:21


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Ric brings the message from the Sushi, Sex, and Subtitles series with A Working Marriage

Russian Radio Show
A1 | At the Restaurant | Russian Lesson (Elementary) | Subtitles | Ep. №123

Russian Radio Show

Play Episode Listen Later Feb 26, 2026 9:57


#russianlanguage​ #русскийязык

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 4 | Is Marriage Really Worth It?

Transformation Church | Pensacola, FL

Play Episode Listen Later Feb 23, 2026 37:00


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Brad brings the message from the Sushi, Sex, and Subtitles series with Is Marriage Really Worth It?

GET REAL with Peniel, BM, and Ashley Choi
Seoul's Best Local Food Picks with SEO EUNKWANG | GET REAL S5 EP17

GET REAL with Peniel, BM, and Ashley Choi

Play Episode Listen Later Feb 19, 2026 53:56


Finally! Our meme king, SEO EUNKWANG is here!We're diving into hidden local food spots and beautiful places in Korea that most tourists don't know about.Subtitles are included, so no worries!

Jordan Supercast
Episode 337: Teaching Assistant Born with Cerebral Palsy Is Incredible Inspiration in Classroom and Beyond

Jordan Supercast

Play Episode Listen Later Feb 19, 2026 20:43


Born with cerebral palsy, she has risen above challenges all her life and inspired countless people; friends, family, even total strangers along the way. On this episode of the Supercast, we invite you to listen closely as Oquirrh Hills Middle School Para-Educator Aubrey Allen talks about earning a bachelor's and master's degree in recreational therapy, leading her to a career in the classroom and as a Unified Sports coach. Amber doesn't let her disorder, which impacts movement, muscle tone, and speech, get in the way of making her dreams come true, and in the process, supporting others with special needs around her. Listen to Aubrey's powerful message, watch on YouTube or read along with subtitles in the transcript below. Audio Transcription Kathy Taylor: Aubrey is amazing. She is helping us with warm-ups. She's helping us design activities to do during our practice time, but not just for sports, because she helps with academics. Aubrey Allen: Students are the best part of my job. Anthony Godfrey: The students are always the best part of our job. [music] Anthony Godfrey: Hello and welcome to the Supercast. I'm your host, Superintendent Anthony Godfrey. Born with cerebral palsy, she has risen above challenges all her life and inspired countless people, friends, family, and even total strangers along the way. On this episode of the Supercast, we invite you to listen closely as Oquirrh Hills Middle School para-educator, Aubrey Allen, talks about earning a bachelor's and master's degree in recreational therapy, leading her to a career as a classroom aide and as a Unified Sports coach. Aubrey doesn't let her disorder, which impacts movement, muscle tone, and speech, get in the way of making her dreams come true and, in the process, supporting others with special needs around her. You won't want to miss Aubrey's powerful message. Subtitles and a transcript for this podcast are available on our website. [music] Anthony Godfrey: We are here at Oquirrh Hills Middle School talking with Aubrey Allen. Aubrey, thank you for taking time to talk with us. Aubrey Allen: Thank you so much. I'm excited. Anthony Godfrey: I'm really happy to meet you and talk with you. I think I've met you before, but it's been a little while since we've talked. Aubrey Allen: Yeah. Anthony Godfrey: Tell me about your job here at Oquirrh Hills Middle School. Aubrey Allen: I am a para-educator and one of our special educators. I'm here at Oquirrh Hills, and I love it. Anthony Godfrey: You love being the aide here in the class, the para-professional. Aubrey Allen: Yeah. Yeah. Anthony Godfrey: What do you love most about this job? Aubrey Allen: I love being able to work with and support the kids. The students are definitely the best part of my job. Anthony Godfrey: The students are always the best part of our jobs. Aubrey Allen: Yeah. Anthony Godfrey: That's wonderful. Who's your favorite student? I'm just kidding. I'm not making you say or answer that question. Now you are a highly qualified individual. Tell me about your degrees and the work that you do outside of Oquirrh Hills. Aubrey Allen: Yeah, though I have a bachelor's degree in math, my master's degree is recreational therapy, and now I'm a certified recreational therapist. I also manage a nonprofit called Adaptive Arena, and we offer free adaptive sports and activities for people of all abilities. I love working there, too. Anthony Godfrey: Now I understand that you also are an advocate for those with disabilities on social media. Aubrey Allen: Yeah, yeah. I started a social media platform for the video about my day-to-day life just to try to make others aware of what life can be like having a disability. Anthony Godfrey: For those listening, tell them about your disability. Aubrey Allen: I have cerebral palsy, and I have a moderate case of it. It affects the way I talk. The end is just how my body moves and how my muscles work. Anthony Godfrey: I've heard you're a big inspiration to those around you here at the school. What do you think about that? Aubrey Allen: I just try to be positive and uplift everyone. I think having my disability has given me a lot of pushback. There are a lot of things in life that are challenging for everyone, and you can either let them know who you are or try your best to write about your challenges and just be happy and positive. Anthony Godfrey: You're a great example of rising above your challenges, and I'm sure that your students really look to you for that positivity. I've only met you a couple of times, but you've been extremely positive. You light up, you're so friendly both times, and you really stand out that way. So I appreciate the positivity and the positivity you're bringing into the lives of the students and the people you work with. Aubrey Allen: That means so much to me. I really do try my best to overcome my challenges. Anthony Godfrey: So tell me more about what happens at the Adaptive Arena. Aubrey Allen: Yeah, it's more like a rec center for people with disabilities. We offer different activities. We have a cheer program and a wheelchair basketball program, and the cool thing about us is we let everyone play. So if somebody in a wheelchair has a brother who is not in a wheelchair, we put the brother in a wheelchair, too, and now they can play wheelchair basketball, or whatever together, and families really like that because typically kids with disabilities can't play on the same team as their siblings otherwise. So I just love that we can do well with different types of families and kids with all different abilities. Anthony Godfrey: So it really is inclusive. Anyone can participate. Aubrey Allen: Yeah. Yeah. Anthony Godfrey: And if you are playing with others who are in a wheelchair and you don't need a wheelchair, you're going to get a wheelchair. Aubrey Allen: Yes, yes. Anthony Godfrey: That seems fair. Now what is your favorite sport? I won't make you tell me who your favorite student is, but what's your favorite sport? Aubrey Allen: I think my favorite is wheelchair basketball because everyone gets so into it. Anthony Godfrey: Yeah, it's fast-paced. Aubrey Allen: Yeah, we do wheelchair basketball every Saturday morning, and we just have a blast. Anthony Godfrey: So it's your favorite and everyone else's also. [music] Anthony Godfrey: Stay with us when we come back. More with Aubrey Allen and her colleagues. [music] Male Voice: Never miss an episode of the Supercast by liking and subscribing on your favorite podcasting platform. Find transcripts for this episode and others at supercast.jordandistrict.org. [music] Female Voice: They are out on the job in the rain, sleet, snow, ice, and in the sunshine, as Jordan School District students navigate their way to and from school every day. We are truly grateful for our city crossing guards, always vigilant and looking out for students to ensure everyone's safety. Because they work so hard protecting our kids, let's give those crossing guards a hand. If you're driving near or around schools, slow down, pay attention, watch for students and staff, and follow instructions from the school crossing guards and know our cities are always looking to hire crossing guards. If you like kids and need some flexible hours, contact your local city and apply to be a crossing guard today. Together, let's make this a safe and successful school year. Anthony Godfrey: What advice do you have for folks about how to make sure that people with disabilities in their lives feel included and noticed and a part of things? Aubrey Allen: Yes, that's a great question. I remember when I was growing up and it was so easy to feel invisible because I had a disability. The people that stood out to me the most and that had the most impact on my life are the people who treated me like they were just anybody else and not only that, but they were comfortable talking to me and asking what I needed and if they should be aware of anything. So I think that is very important. Individual people do know that people with disabilities, they often times know they just want to be treated and included like everyone else. Anthony Godfrey: I love that. Just treat them like you treat everyone else. And I love that you said the people that have had an impact on your life are the ones who said, "How can I help you?" And just treated you like everyone else, asked you questions, and talked with you. Aubrey Allen: Yeah. Anthony Godfrey: Let's talk with some of the folks that you work with. Introduce yourself. Kathy Taylor: Hi, I'm Kathy Taylor and I am the teacher of the classroom that Aubrey is working in right now and we love Aubrey. She brings so much to our class. Anthony Godfrey: Tell me what it's like getting to work with Aubrey day in and day out. Kathy Taylor: Well, she's always positive. Anthony Godfrey: That's obvious. That's obvious. Kathy Taylor: Even when sometimes it doesn't feel like it's a positive day. Some days are up and some days are down, but Aubrey is always up. Aubrey's expertise with the recreational therapy has been great. We are a Unified Sports school meaning . . . Anthony Godfrey: You're a premier Unified Sports school. You're an award-winning Unified Sports school. Talk about that. Kathy Taylor: So Unified Sports is a program with Special Olympics where we are participating in team sports with our students that have disabilities and with their peers. Peers and our students with disabilities all play on the same team. And for us, that's a huge deal because a lot of times our students are not on teams. Or parents, they go and watch a lot of sibling games, or they watch a lot of their friends play sports. But a lot of our kids don't play sports on their own. So this gives them the opportunity to be on a team, to get that camaraderie, feel like what competition is like, feel experience at the tournament. And Aubrey has brought so much to that. She is our coach basically. I don't know if you've gone through all the trainings yet. I'm still working on those, too. But our official coaches, hands down Aubrey is amazing in that capacity with the sports. She is helping us with warm-ups. She's helping us design activities to do during our practice times that will help us work on specific skills. And she's able to adapt things for the kids that aren't able to do what everybody else does, and they can still interact with their peers. So it's been amazing having Aubrey, but not just for sports because she helps with academics. She goes to classes with kids. She helps us with their personal care. She does it all. She does it all, and she does it with grace, and she does it with humor, and she is a pleasure to work with. Anthony Godfrey: I would think it's pretty hard to be negative around Aubrey or be down on yourself. She doesn't let that happen. The incandescence keeps everyone from getting off the path, I guess. Kathy Taylor: She's very positive and the kids have responded really well to her. They really enjoy her. Anthony Godfrey: Let's talk with the principal. Introduce yourself and tell us about Aubrey. You contacted me and let me know that we really ought to come talk with her. Lisa Jackson: I did. I did. I'm Lisa Jackson, Oquirrh Hills principal. When we interviewed Aubrey, after Aubrey left the room, we all looked at each other and said, "How can we make this work because she needs to be part of our team here at Oquirrh Hills.” When we called for her references at the Adaptive Arena, they said the same thing, “She's amazing. Everything she does. She's just highly qualified. She's kind. She's motivated.” I interact with Aubrey in the halls a lot because she's walking these kids to and from class, and she understands what they need. She's receptive to their needs. And I think the coolest thing about it is just having our kiddos who do have disabilities, who have some struggles, being able to see just what they can accomplish in life and how successful they can be. And I think seeing Aubrey every day and seeing how successful she is just gives them motivation to follow in her footsteps and do some of the things she's done. Anthony Godfrey: You're not very convincing when you say you can't do it and Aubrey's in the room and she's bringing the energy and demonstrating that “yes, you can. You absolutely can.” Lisa Jackson: You can do it with a smile. Anthony Godfrey: Right. Lisa Jackson: Right. It might be hard, but you can do it. I think she also inspires all of the kids, though. Like, I feel like they didn't necessarily know, even our peer tutors didn't know what to expect when you meet Aubrey and you're not sure, you've never met Aubrey before. You're not sure. A lot of people don't have experience with somebody that has cerebral palsy. And so it's been great. I think it's an inspiration to them, too, because you're knowledgeable. You're educated. You're well spoken. And when you say they need to do something, you mean business. So, like, it's, you know, there's a level of respect that Aubrey has earned among all the kids and the adults, really. Anthony Godfrey: I'm inspired hearing about what you do and meeting you again. What do you like to do when you're not at the Adaptive Arena or here at school? Aubrey Allen: Oh, man. I spent a lot of time with my family. I have two younger brothers who I helped raise because there's a big age gap between them. There are things that my family and I work out every day. So I'm at the gym a lot. I feel like and then I like to hike and bike and just be outside. Anthony Godfrey: You're very active sounds like. Aubrey Allen: Yeah. Anthony Godrey: I sit in a lot of meetings. I do email. But, you know, I try to get as much pleasure from that as I can. [laughing] Anthony Godfrey: Well, it's a real pleasure meeting you. Thank you for everything you're doing, for inspiring me, the people that you work with and the students that you serve. You're awesome. Thank you. Aubrey Allen: Thank you for your time and opportunity. Anthony Godfrey: Thank you. And thank you both for talking with us. [music] Anthony Godfrey: Thanks for joining us on another episode of the Supercast. Remember, “Education is the most important thing you will do today!” We'll see you out there. [music]

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 4 | Engagement - Preparation for What You're Building

Transformation Church | Pensacola, FL

Play Episode Listen Later Feb 16, 2026 31:32


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Justin brings the message from the Sushi, Sex, and Subtitles series with Engagement - Preparation for What You're Building

Japanese Podcast | 英会話 - Lazy Fluency
What Do Japanese People Want in A Relationship? - LF #206 (Japanese Listening + Subtitles N2-N3)

Japanese Podcast | 英会話 - Lazy Fluency

Play Episode Listen Later Feb 16, 2026 21:45


This week we talk about the shift in relationship ideals between men and women in Japan and what that says about Japanese society as a whole. Survey: https://news.mynavi.jp/article/20250305-3137498/DETAIL/ Send us questions at:  lazyfluency@gmail.com Join the Community: Discord: https://discord.gg/VGSd94Tp4P Book Club! https://discord.com/channels/1204531163377442866/1440725472878006355 Support on ko-fi:  https://ko-fi.com/lazyfluency  

Breakroom Talk
The Fall Off

Breakroom Talk

Play Episode Listen Later Feb 13, 2026 106:46


Episode 262 We gotta pay respects to several people who passed, and then we touch topics including the Super Bowl halftime show, GloRilla's viral family situation, childhood rap names, J. Cole's recent release, and Valentine's Day. The Super Bowl performance by Bad Bunny, culture and representation, language barriers, and the backlash from people like Trump, while also talking about the U.S. as a melting pot and the value of learning Spanish. The conversation shifts to GloRilla and her sister going viral over claims that GloRilla isn't supporting family financially; the hosts unpack conflicting accounts, the realities of rapper income, taxes and label advances, the ethics of family expectations, and how posting family issues online can permanently damage relationships. Y'all won't believe our old rap names and we gotta review J. Cole's new project, with one host breaking down the concept while others criticize the music as repetitive, overly self-produced, and overly tied to ‘the Ville,' alongside broader thoughts on artistry, growth, and the ‘best rapper' persona. 00:00 Cold Open: Hustle Bars & Intro Vibes00:30 Episode Kickoff: What Movie Clip Was That?01:08 RIP Shoutouts & Weekend Rundown Setup02:56 Super Bowl Watch Party Recap03:39 Bad Bunny Halftime Debate: Representation vs. Personal Taste10:29 America as a Melting Pot: Language, Subtitles & Culture19:53 Should Americans Learn Spanish? Language & Power Talk22:51 Black Representation in Media: Baddies, Algorithms & Parenting32:12 Next Topic Tease: GloRilla's Sister Goes Viral32:21 GloRilla Family Drama Breakdown: Money, Loyalty & Receipts37:17 Family Group Chat Receipts: The Sister Airs Out GloRilla Drama38:56 What a Millionaire ‘Should' Do for Parents & Siblings (and Why It's Complicated)40:51 Industry Reality Check: Taxes, Image, and Everyone Expecting a Handout43:54 Teach ‘Em to Fish: Jobs, School, and LeBron's ‘Everybody Works' Model45:07 Taking It to the Internet = Burned Bridge (and the Flexing Problem)47:52 If I Had $5 Million… Who Gets What? Setting Boundaries with Family53:46 No Retiring Nobody: Generosity vs. Becoming the Family's ATM01:02:36 Random Detour: Old Rap Names, Freestyling, and ‘Walmart Days' Memories01:06:45 Switch to Music Talk: J. Cole's New Project—Disses, Storytelling, and Critiques01:12:08 Cole Lost Me: Bragging, Tapping Out, and the Deleted Diss01:13:20 ‘Away Games' & The Sound Shift: Singing, Experiments, and Falling Off the Cole Train01:14:22 Dreamville/Ville Fatigue: When Humble Becomes Performative01:16:43 Forest Hills Peak & Mixtape Era Nostalgia (Friday Night Lights, ‘Workout' Debate)01:18:54 Crossover Talk: Kendrick's Hits, White Audiences, and What ‘Crossing Over' Means01:20:50 Let Nas Down & The Artist Dilemma: Core Fans vs Growth and Radio Records01:22:46 Stuck in the Box: Production Help, Collaboration, and Why Cole Feels Stagnant01:26:06 Switching Gears: Valentine's Day, Being Single, and Social Media Pressure01:34:36 Is Valentine's Day for Women or Couples? Effort, Reciprocity, and ‘Sweetest Day'01:44:59 Closing Thoughts: Love Beyond Couples + Wrap-Up & Subscribe

Nerdoparlante
No Subtitles Needed | Nerdo Review | Super Bowl LX Halftime Show Ft Bad Bunny

Nerdoparlante

Play Episode Listen Later Feb 11, 2026 26:33


Hablamos de el show del medio tiempo del Super Bowl 60 donde nuestro Benito Antonio Matinez Ocasio mejor conocido como Bad Bunny dio un espectaculo donde se rinde homenaje a la cultura puertorriqueña y a toda latino america, llevando un mensaje de union en tiempos dificiles.

The Joe Show
Subtitles & No Sound

The Joe Show

Play Episode Listen Later Feb 10, 2026 7:11


Joe's brother has a girlfriend that he just started recently seeing... and she does this while watching Netflix.

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 2 | Don't Do That, You're Just Dating!

Transformation Church | Pensacola, FL

Play Episode Listen Later Feb 9, 2026 35:24


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Brad brings the message from the Sushi, Sex, and Subtitles series with Don't Do That, You're Just Dating!

Transformation Church | Pensacola, FL
Sushi, Sex, and Subtitles | Part 1 | Set Apart, Not Set Aside

Transformation Church | Pensacola, FL

Play Episode Listen Later Feb 2, 2026 35:13


Welcome to the Transformation Church Podcast! Each week you can be a part of the weekly sermon delivered at TC by one of our Pastors. You can join us and listen to each message and then catch our Message Re-Cap Podcast on each Wednesday where we talk a little message and a lot of nonesense. Thank you for taking the time to connect with us and with God through this message! For more info about Transformation Church check out our website at https://transformationchurch.com This week Pastor Brad brings the message from the Sushi, Sex, and Subtitles series with Set Apart, Not Set Aside.

Mojo In The Morning
Mojo Can't Turn Of the Subtitles

Mojo In The Morning

Play Episode Listen Later Jan 19, 2026 10:25 Transcription Available


See omnystudio.com/listener for privacy information.

Hacking Chinese Podcast
285 - Chinese subtitles and transcripts: Reading before, while or after listening

Hacking Chinese Podcast

Play Episode Listen Later Jan 19, 2026 20:52


Subtitles and transcripts can help you understand spoken Chinese, but do they also help you become a better listener? Should you read along, read first, or save the text for after you've listened?#learnchinese #listening #reading #subtitles #transcriptsLink to article on Hacking Chinese: Chinese subtitles and transcripts: Reading before, while or after listening: https://www.hackingchinese.com/chinese-subtitles-and-transcripts-reading-before-while-or-after-listening/The Fluent Listener: Navigating Spoken Mandarin Like a Fish in Water: https://www.hackingchinese.com/courses/the-fluent-listener-navigating-mandarin-like-a-fish-in-waterListen more than once: How the replay button can help you learn more Chinese: https://www.hackingchinese.com/listen-more-than-once-how-the-replay-button-can-help-you-learn-more-chineseListen before you read: Improve your listening ability: https://www.hackingchinese.com/listen-before-you-read-improve-your-listening-abilityThe best YouTube channels for learning Chinese: https://www.hackingchinese.com/the-best-youtube-channels-for-learning-chineseThe best podcasts for learning Chinese: https://www.hackingchinese.com/the-best-podcasts-for-learning-chineseMore information and inspiration about learning and teaching Chinese can be found at https://www.hackingchinese.comMusic: "Traxis 1 ~ F. Benjamin" by Traxis, 2020 - Licensed under Creative Commons Attribution (3.0)

Or Whatever Movies
Interview Style 002: Subtitles And Music For Mom

Or Whatever Movies

Play Episode Listen Later Jan 13, 2026 9:53


Whether you're a subtitle skeptic or a closed-captioning devotee, this (almost) spoiler-free discussion will have you reconsidering how you watch movies. Learn more about your ad choices. Visit megaphone.fm/adchoices

The John Batchelor Show
S8 Ep258: STELLAR ORIGINS AND COMPETING COSMOLOGIES Colleague Professor Paul Halpern. The focus shifts to Fred Hoyle, whose musical mother taught him to read via silent film subtitles. Halpern details Hoyle's journey to Cambridge, where his ambition to w

The John Batchelor Show

Play Episode Listen Later Dec 29, 2025 7:04


STELLAR ORIGINS AND COMPETING COSMOLOGIES Colleague Professor Paul Halpern. The focus shifts to Fred Hoyle, whose musical mother taught him to read via silent film subtitles. Halpern details Hoyle's journey to Cambridge, where his ambition to work in nuclear physics was interrupted by WWII radar research. Hoyle became fascinated by astronomy, eventually authoring a key 1946 paper on stellar nucleosynthesis, proposing that elements are forged inside stars. This set the stage for the "Great Big Bang Debate." While Gamow argued for element creation in a hot, primeval explosion, Hoyle developed the Steady State theory, filling in the gaps of an expanding universe. NUMBER 2 1961

LINUX Unplugged
642: Tunneling Home for the Holidays

LINUX Unplugged

Play Episode Listen Later Nov 24, 2025 56:17 Transcription Available


Chris cooked up a wild remote-access trick for Jellyfin that skips VPNs entirely. One tiny toggle spins up a secure tunnel on demand. Simple, absurd, and shockingly effective.Sponsored By:Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love. 1Password Extended Access Management: 1Password Extended Access Management is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps. CrowdHealth: Discover a Better Way to Pay for Healthcare with Crowdfunded Memberships. Join CrowdHealth to get started today for $99 for your first three months using UNPLUGGED.Unraid: A powerful, easy operating system for servers and storage. Maximize your hardware with unmatched flexibility. Support LINUX UnpluggedLinks:

Mojo In The Morning
Mojo Can't Turn Off The Subtitles

Mojo In The Morning

Play Episode Listen Later Nov 17, 2025 10:25 Transcription Available


See omnystudio.com/listener for privacy information.

The Dana & Parks Podcast
HOUR 4: Do you use subtitles?

The Dana & Parks Podcast

Play Episode Listen Later Oct 29, 2025 33:25


HOUR 4: Do you use subtitles? full 2005 Wed, 29 Oct 2025 22:00:00 +0000 RehjstyvmBKE7MOskPyQZbd7UHUai9K6 news The Dana & Parks Podcast news HOUR 4: Do you use subtitles? You wanted it... Now here it is! Listen to each hour of the Dana & Parks Show whenever and wherever you want! © 2025 Audacy, Inc. News False https://player.amperwavepodcasting.com?feed-link=https%3A%2F%2F

The Sports Junkies
H3: Fixing Washington's Defense, Callers Weigh In, Watching With Subtitles

The Sports Junkies

Play Episode Listen Later Sep 30, 2025 41:45


09/30 Hour 3: How Would You Fix The Commanders Defense - 1:00 Calls On Fixing Washington's Defense - 16:00 Are You In Or Out On Subtitles - 32:00