POPULARITY
QUOTES from the episode: "The people closest to the work often have the clearest view of what needs to change." "Codes do not become practical until field experience finds its way into the conversation." "Better data leads to better codes, and HERS® Raters are sitting on some of the most useful data in the industry." "The future of energy codes will be shaped not only in committee rooms, but also in crawlspaces, attics, and blower door tests." In this episode of RESTalk, host Bill Spohn welcomes Nathan Kahre of the National Association of Home Builders, Alex Smith of the International Code Council, and Kris Stenger of ICC for a practical discussion on the 2027 IECC, the emerging 2030 framework, and how energy codes move from model language to real-world adoption. Together, they explain how the IECC is developed, how consensus committees and public comments shape the process, and why input from HERS® Raters, builders, subcontractors, code officials, and consumers matters. The conversation highlights several important 2027 IECC updates, including clearer blower-door testing provisions, expanded recognition of ANSI/RESNET®/ICC 380, tighter ERI targets, performance-path changes, improved fenestration values, expanded energy credits, and a new energy audit requirement for large additions. The guests also discuss how state and local adoption varies widely, from formal code councils and rulemaking processes to state-specific pathways like Texas HB 1736. Most importantly, Nathan, Alex, and Kris encourage HERS® Raters to get involved. Whether through local home builder associations, RESNET® program engagement, ICC interested-party lists, or future participation in the 2030 IECC subgroup, raters can bring field data, practical experience, and clarity to a process that increasingly depends on verified performance. Kris's contact info: kstenger@iccsafe.org Alex's contact info: alsmith@iccsafe.org https://www.linkedin.com/in/alexsmithenergy/ Nathan's contact info: nkahre@nahb.org https://www.linkedin.com/in/nathan-kahre-94a779108/ ICC Committees: https://www.iccsafe.org/committees/energy-iecc/ https://www.iccsafe.org/membership/councils-committees/icc-committee-application/ The code adoption kit we discussed: https://www.nahb.org/advocacy/top-priorities/building-codes/code-adoption-kits/2024-International-Energy-Conservation-Code To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
We're announcing AIEWF speakers this week! Take the AI Engineering Survey!Today's guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months:He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…)Put it this way: In the near term, the next Sora won't be a better video model, but a video agent.Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs.At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models.Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA's Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models.We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of VideomaxxingVideo agents are almost a sure bet to be the trend in the coming year. We end with a glance at what's beyond video agents:Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone.We discuss:* Why fast iteration mattered more than meetings* Why small training bugs can drive huge model quality gains* Why coding models may make compute the bottleneck again* How image and video models are trained with synthetic captions* The role of VAEs and latent space in frontier video models* Why image models are the foundation for video models* The tradeoff between temporal compression and real-time interactivity* Flipbook, Neural OS, and the future of generative UI* Why future interfaces may go from user intent to pixels* The hidden cost of training video models: storage, egress, and GPU hours* How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster* Grok Imagine 0.9 and large-scale audio-video generation* Why audio-video alignment is harder than text-video alignment* Ethan's definition of world models* Reference-to-video, video extension, and long-context video generation* Why xAI's research communication undersells Grok Imagine* How xAI culture shaped the speed of development* AI watermarking, SynthID, and detecting generated media* Why prompt rewriting matters for video models* Grok Imagine Agent and the rise of video agents* Why language models may unlock better video generation* Robotics, physical AI, and embodied world models* Why Ethan left xAI and shifted focus toward LLMs* Self-managed context, memory, and the next frontier for language modelsEthan He* LinkedIn: https://www.linkedin.com/in/ethanhe42* X: https://x.com/EthanHe_42Timestamps00:00:00 Introduction00:01:25 From NVIDIA Cosmos to xAI00:03:24 Building Grok Imagine from Zero to One00:10:07 How Image and Video Models Are Trained00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs00:22:10 Generative UI, Flipbook, and Neural OS00:32:10 The Cost of Training Large Video Models00:37:04 Distillation, GANs, and Fast Video Inference00:41:21 Audio-Video Generation and Grok Imagine 0.900:48:34 What Makes a World Model?00:55:51 Reference Videos, Long Context, and Video Memory01:00:11 xAI Culture, Research, and First-Principles Building01:09:45 AI Safety, Watermarking, and Prompt Rewriting01:13:10 Video Agents and AI-Assisted Creation01:27:32 Why Language Models Unlock Better Video01:31:15 Robotics, Physical AI, and Embodied World Models01:32:38 Why Ethan Left xAI01:34:16 Self-Managed Context and the Future of LLMs01:38:43 Ethan's Career Path and Closing ThoughtsTranscriptIntroduction: Ethan He, Latent Space, and the Path to xAISwyx [00:00:00]: We're here in the studio with Ethan He, most recently of xAI. Welcome.Ethan [00:00:10]: Thank you. Glad being here.Swyx [00:00:11]: We're also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that.Ethan [00:00:23]: I've actually, I also presented the MoEs twice at latent space.Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked?Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It's very nice.Ethan [00:00:49]: I learned a lot.Swyx [00:00:49]: I think three years stop. We haven't stopped even on Christmas and New Years. many weeks I want to stop but it keeps going.Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.”Vibhu [00:01:04]: But I might have reached out to you after.Swyx [00:01:05]: you-- because it's an amateur club, right?Swyx [00:01:08]: so it's very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good.Vibhu [00:01:18]: Came out yesterday.Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it's a good one. We'll, we'll recommend people to read it.Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don't even know when you joined. just like tell the, tell the story about the sort of transition.From NVIDIA Cosmos to xAI: Scaling Video and World ModelsEthan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it's a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that's, that's why I realized I need to move to somewhere with much more compute resources. That's how ISwyx [00:02:13]: Than NVIDIA?Vibhu [00:02:14]: The GPU rich came themselves.Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything.Ethan [00:02:25]: It was end of twenty-four.Vibhu [00:02:28]: End of twenty-four.Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine.Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation.Building Grok Imagine From Scratch in Three MonthsSwyx [00:03:24]: Can you give like a rough roadmap of okay, you're on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you're setting up a new team?Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeahSwyx [00:03:51]: three months is likeVibhu [00:03:52]: From everythingSwyx [00:03:52]: actually like very surprisingly fast.Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work towards the same goal. It's, it's like every day there's not that much meetings on the calendar, like maybe like a, like a sync a day, and after that it's, it's just all building. It was pretty fun at that time.Ethan [00:04:47]: And another thing is that xAI has very strong foundations of like data inference, model inference, and the supporting there can help the model develop a lot. When I look at, training models, I don't so actually the top important thing is like how many, how many iterations can you do, per day? and the more iteration can you do, you can, you can train the model much faster. So if you have very strong infra and you have a lot of compute, you can, you can train these models in very short period of time. That can give you a much larger buffer to, for errors, and it also gives you the opportunity to spot more bugs.Iteration Speed, Compute, and Debugging Model PipelinesSwyx [00:05:46]: What is an iteration? Is it like a few hundred steps or what are youEthan [00:05:50]: Let's say just the train-training the model, like from acquire new data and maybe design new algorithms and train a new model, maybe at smaller scale orSwyx [00:06:01]: So cycle time for like any hyperparam that you're searching.Ethan [00:06:04]: Cycle time and tune to like eval this model. Is this model better than my previous iteration?Ethan [00:06:11]: SoSwyx [00:06:11]: So it's like before you, someone had already set this up that you can iterate very quickly.Ethan [00:06:15]: I think the foundation there is extremely good forDeveloping and research models.Ethan [00:06:23]: And often I find is it-- this is kind of boring, but like a lot of the improvements does not come from new algorithms. It comes from finding small bugs here and there in the data pipeline, in the, in the model training pipeline. Those give, those give the biggest boost to the model quality.Vibhu [00:06:46]: It's interesting, right? So you say it's like small team, less communication bandwidth, but also a lot of quality is like find little bugs. It seems counterintuitive, right? You have a lot of people, you can iron out more of those, but it's interesting to see the other side, right?Swyx [00:07:00]: I also wonder, have you-- do you try using LLMs to look for bugs? I don't know.Ethan [00:07:05]: I remember at that time it was mid two thousand and twenty-five, so it's the coding model wasn't quite there yet. I remem- I remember like December two thousand and twenty-five, it was extremely good. Yeah, I've been, I've been using it at that time. It's, it's helpful. sometimes it produce codes that are kind of difficult to maintain, even though like the first time it built something extremely fast. But it gave the, like a spaghetti code, thousands of lines that I couldn't maintain, and the LLM itself couldn't figure out what's, what's wrong and how to improve on top of it. But now I find it much better. Yeah, I want to bring up another point here is now coding models are much more efficient and can help us implement stuff much faster. Compute might become a bottleneck again because previously, like if you want to train a new model, say you want to generate new synthetic data and then or write a new algorithm, it might take a few weeks. And during that period of time, you don't-- you might not have experiments to run. But now you can build that thing within a few hours, then you can immediately train a model.Ethan [00:08:24]: Now you have to have enough compute to try all of the ideas. So compute might be the bottleneck of iterating speed again.Swyx [00:08:36]: yeah, I actually, honestly, I think it's like kind of a stressful job because you're “Well, I should be trying everything, and if I'm not, then I'm not doing my job well.”Vibhu [00:08:48]: there's also the stress of you're eating thousands of GPUs per hour, which is very expensive and, compute can go to other researchers.Swyx [00:08:56]: You got the daddy Elon toVibhu [00:08:57]: You got daddy Elon.Ethan [00:08:59]: It wasVibhu [00:09:00]: But there's still finite amount of compute, like you want to use it, you want to use it well, you want more of it.Ethan [00:09:06]: That was quite stressful indeed. Yeah, I think one thing is the-- with coding models now, like a lot of these jobs can be automated, which is much better. A second, it's a, it's a marathon, so you got to maintain good health and, a regular schedule.Vibhu [00:09:28]: It's, it's hard to hear that when you shift from zero to nothing in two months.Swyx [00:09:32]: and, I think obviously the culture at xAI is very famously, people work very hard. one thing I did want to dive into, in our-- in the notes that you, that you sent ahead of time, you had specific comments about the cost of Video Gen training. presumably this is on the Colossus-1, right? the two hundred megawatt cluster. Any whatever you want to just share on that.Vibhu [00:09:54]: I think there's, there's three things we're talking about, right? So there's Video Gen, there's also the Image Gen model that you put out. Do you want to like complete the, okay, so zero to one, you have a few months. Just what are the stages of create Image Gen model?Swyx [00:10:06]: Oh, yeah, maybe I got distracted.How Image and Video Models Are Trained: Synthetic Captions, Tokenizers, and VAEsVibhu [00:10:07]: Sorry. and then, from there's Video Gen, there's Audio Gen. Would love to get into those next. But what is that first few months like? So small team, a lot of bugs, iterations, but what does it look like? Do we take something off the shelf? Do we just get data compute? What's, what's the few months like? How do you go to state-art Image Gen model? How do you just start?Ethan [00:10:28]: I cannot comment specifically how xAI did, but it's, it's a quite standard process. I can draw some, examples from Cosmos. So mainly it's building a video model, you actually need to build a image model first. And building these two models, the data you need is a hundred percent synthetic pair of language and image or language to video. Because on the, on the internet, actually, the videos don't naturally associate with text. So you can say, oh, like on YouTube, you have the title and you have the description and the commentsSwyx [00:11:11]: TitleEthan [00:11:11]: of a video, but usually they're not relevant to the video itself. And say maybe like the video is a natural scene of mountains or something, and the title is, I'm so happy today.Ethan [00:11:26]: So they have they have no correlation at all. So the first step is to, you have to generate synthetic pair of language with the videos. So you gather videos from the internet, and you use a VLM to caption the videos. So that part, here's a question, like how do you, how do you gather VLM to begin with? So if there's noSwyx [00:11:55]: You, so you fuse the model, right? LikeEthan [00:11:57]: Say if there's no like VLM exists, like how do you generate the text to the beginning, right? It's, it's impossible.Swyx [00:12:04]: I see.Ethan [00:12:05]: In the beginning, it's like you ask human to describe the video as detailed as possible.For example, you ask them to describe everything, like all objects, all characters, and all interaction and dialogues in the, in the videos. So that's in the protocol of Cosmos labeling. We require the objective we give to the labelers was that you have to describe the video as detailed as possible, such that a blind person hears a blob of text can reconstruct what the video is like from their head.Swyx [00:12:43]: Video or image? You're talking about images.Ethan [00:12:44]: Video or image, either one of them.Vibhu [00:12:47]: This was pretty common when we went from clip and DALL-E, right?Vibhu [00:12:51]: It's all training on really detailed captioning of images. So same is applied to video, but insteadEthan [00:12:57]: same appliedVibhu [00:12:57]: of using multimodal model to pass in video images and write rich descriptions, you can alsoSwyx [00:13:04]: I think there's this traditional perspective of supervised, or, very highly human curated thing. I feel like there's a unlock with unsupervised, right? Where like you have enough to bootstrap that you can just throw common corpus on it or, whatever. like unsupervised vision and language pairing, right? Like where you just have, interspersed image and text and it just learns. To me, that is the VLM breakthrough that is different from the clip, different from the LM era.Ethan [00:13:36]: It's interesting to see that you kind of need both data.Ethan [00:13:41]: For example, for theSwyx [00:13:41]: You need it to bootstrap it up. YeahEthan [00:13:43]: for the generative model training, there's also usually like a small percentage of unlabeled data. So the model is instructed to generate a video without any text instruction. That can also help the model generalize. So after this stage of generative synthetic pair, so, one important common step is to train a compressor or a tokenizer of the image or videos. So because, if you train-- If you can technically, theoretically train image or video models on pure pixels, but the problem is that the, it's, it's a lot of tokens. So like one image, it's, a thousand by a thousand, it's like one million tokens, one million pixels. It's impossible to train transformer on that. So it's, you need to train a tokenizer, which can go from image to latent space and latent space back to image.Swyx [00:14:45]: That's why we named the podcast.Swyx [00:14:48]: But, basically, you're talking about vocabulary science.Ethan [00:14:50]: so vocab.Swyx [00:14:51]: And so, what is, what is imp-- like a million is impossible?Ethan [00:14:54]: In generative models, the vocab is continuous. It's a continuous space. We can think about like you map an image to a vector. It's a, it's a fixed length vector. It's sixteen or forty-eight, something like that. And then you map that vector back to the image space. And the mapping is, has-- The mapping is patch-based. So you say you haveEthan [00:15:22]: a sixteen by sixteen patch and you match, you map that patch of pixels into this latent space.Swyx [00:15:29]: We've covered thisVibhu [00:15:30]: This is like the vision transformersSwyx [00:15:32]: VAEs,Ethan [00:15:33]: VAEs.Vibhu [00:15:34]: You basically compress your input, you do your generation, you're reasoning all that generation in smaller dimension, and then you project back out.Swyx [00:15:43]: VAE is a form compression, but I think the for me, the patching thing is from VIT, right?Ethan [00:15:48]: You can make those.Swyx [00:15:49]: Literally the, yeah, the paper is titled like sixteen by sixteen is all you need. something like that. and then I think also, people make a lot of comparisons with this kind of patching with convolutions.Swyx [00:16:02]: Which is you're, you're kind of re- reconstructing the old paradigm with the new.Ethan [00:16:05]: Actually, in VAEs, there are, there are both convolution networks and transformers. You can actually do both.Ethan [00:16:14]: After this VAE, so what you've got is you've got latent space tokens and you've got the language tokens. So now the training of the diffusion transformer, usually generative models use diffusion transformers. It is actually quite standard. It's, it's very similar to how you train a language transformer models. It's not that much difference. It's just the tokens, the visual tokens in, visual tokens out. The only difference is there's a denoising process. So you train the model to unmask some of the noise. So you add, you add random noise to the visual tokens, and then you train the model to remove those noise to generate the clean tokens. Any inference, the model can iteratively remove noise from a hundred percent noise.Swyx [00:17:12]: And then there's also, to speed things along on the tech tree of diffusion, there's CFG, and then there's, there's also, latent diffusion that, there's, there's someone in there. I think, somewhere along the line, obviously, like stability and all these other guys, pioneered a lot of this, architecture. I don't know if you want to get into that or just, or do the video side up to you.Bootstrapping Video from Image Models and Temporal CompressionEthan [00:17:37]: After you train such model, such image model, the reason it's a, it's a foundation for video models is that image models are cheaper to train, and they have much denser connection between language and text. So, sorry, language and images. For example, you train a billion, you train on a billion images, and there's a mapping from the text to the image. And the cost to train the same, like the, a billion, a billion text to a billion videos, that's much more expensive because videosNaturally have more tokens than images. Because the diffusion models, their understanding of, language purely come from this mapping. So if you don't have enough mapping, so if you only train on like a ten million videos or something, there-- you might not see enough language tokens in your training, so your model does not understand human intention enough. So that's why you really-- you train-- you first train this image diffusion models, and then you bootstrap the video model from there.Swyx [00:18:53]: One thing I did want to ask, because I-- actually, I think you're, you're the first per-- video model person I've ever talked to, I think. we've, we've like talked to Luma and all those folks. There's all these tricks in video compression where basically frame by frame there's not that much difference, so actually you don't have to regenerate or save the whole frame, right? but I think MP4 compression or something else like that.Swyx [00:19:16]: is it tempting to use that? Or as far as I can tell, everyone just treats it as, “No, we would just generate every frame.” Is that roughly the state-art?Ethan [00:19:27]: There are a few different approaches. Let's say first, like you want to just directly use MP4 compression and use that as the tokens for the transformers to train, right? So people actually have tried that, but the main challenge is the latent space for the MP4 tokens were not, were not very comprehensible for the models. It's, it's extremely hard to train on that. And there's aEthan [00:20:01]: So that's why they created VAEs, which creates more continuous, latent space, so the models can understand that latent space and learn from it much easier. Even within the VAEs, there are different difficulties of the latent space. So you can imagine something the simplest, the most naive VAE is like you have an image, and you just shuffle all of the images into a, into a vector. So you don't need to train any VAEs, right? But that latent space is extremely hard for models to train on top of. That's why there are some debate on like how do you compress the tokens. So you mentioned like you can compress frame by frame. Also, you can compress, the temporal dimension.Ethan [00:20:52]: The difference is if you compress the temporal dimension, you get a much higher compression rate. Because there's temporal redundancy between frames, because, this frame and the last frame, likely they are mostly similar, so there's only some small difference. for example, I think in 12.1 VAE, they have like a eight by eight by four compression rate. So the four temporal tokens are compressed into one tokens. That can save a lot of, save a lot of the context length. If you do it frame by frame, you have to do maybe like eight by eight by one. Your context length will be four times larger. That being said, the benefit of the frame-- per frame compression, we might come back to this later, is, real-timeness and interactivity. ‘Cause if you, if you strain the output of the model, frame by frame, you can-- the model can respond to any user request immediately. So if you have like a temporal four compression, four times compression, thenSwyx [00:22:06]: It might be laggyEthan [00:22:07]: there's a lag there in nature.Swyx [00:22:10]: So you're very pilled on this. let's just go ahead and bring it up ‘cause we have the visual prepared anyway. There's some frontier applications of real-time video gen. So Flipbook is one of the examples that went viral recently, right? What is Flipbook?Real-Time Generative UI: Flipbook, Neural OS, and Diffusion Front EndsEthan [00:22:23]: Flipbook is kind of like a web brow- web browser. You can see like it has the web bro- browser UI on top. The difference is all of the UIs are generated by generative image model in real time, and anything here are fake. But you can, you can explore inside this wor- this imaginary world. Say like we-- here we have engineering the Great Pyramid. Like the model generates this for us to understand how it works, and if we want to navigate around and understand further, we can click on some of the, some of the description here, and the model will generate a new page, new subpage describing the details we want to know about.Swyx [00:23:14]: So it's basically kind of we're playing a video, but it's pausing for our next interaction, and then it just plays the next thing based on our interaction.Swyx [00:23:23]: Which is kind of cool.Vibhu [00:23:25]: and you kind of decide your story. So this was, how do you make a pyramid? levering technique seemed interesting, right? It shows how do you take Okay, I want to know what is thisSwyx [00:23:35]: The demo, the demo tweet had more animation between frames.Vibhu [00:23:38]: I think it's just skipping,Swyx [00:23:39]: Oh, it's just skipping a lot of frames.Ethan [00:23:40]: they also have a video modeVibhu [00:23:42]: It takes a lot. There's a lot of peopleEthan [00:23:42]: but, a lot of people are using it.Ethan [00:23:45]: So it's not available.Vibhu [00:23:46]: There's a live video stream. We can try,Swyx [00:23:50]: So this is an example of the kind of future that you see at the extreme. We don't-- we're obviously not in it today.Swyx [00:23:56]: But in a world where inference is completely free this is better than generating code and text?Ethan [00:24:02]: So this is, this is a final state of where Viva will be at for word model, I think. Imagine internet doesn't exist, and then you type in google.com. Like what should, what should, what should a model show you?the model can imagine something, and this is what the model imagine. And these web pages, they completely do not exist. So I think as the inference costs come down, we are going to have generative UI for everything. If you think about how the coding model works, so they write code for a web page, and they render the code might be con- converted into binary, and the binary render the pixels on the screen. So we in machine learning, every time we have some breakthrough, obviously it's, it's more intuit. So why don't we have like user instruction to the pixel directly? So the generative UI will be user intention to the pixels directly. And say like even if I want email, let's say everyone have the same interface, but I want, I want it slightly different. I want the email to show to me like a TikTok, so I can swipe left and right for the emails. And or maybe you want something else. We can have completely different things. Or like I have I'm looking at, Instagram stories, and I don't like the Like button. I always may click it. And, generative UI resolved it. So it's going to be a revolutionary replacement of the interface. So in the future, we might have much more powerfulEthan [00:25:50]: LLMs and coding models running behind the scene. And in the, in the front-end, the diffusion model will actually be the front-end to show stuff to you. That's how I imagine it.Swyx [00:26:02]: Diffusion front-end, deterministic back-end.Swyx [00:26:04]: Something like that. I find that very expensive, but,Vibhu [00:26:08]: I find it interesting you called LLMs writing code on the back end deterministic, but okay.Swyx [00:26:14]: you write it onceVibhu [00:26:15]: Compare it toSwyx [00:26:16]: And then you execute.Ethan [00:26:17]: If you think about the cost, say, let's say H100 costs $1 per hour, and if you use this eight hours a day and thirty days, so, every month you're paying this two forty, you'll actually not wanna pay for that. That's even more expensive than Cloud Code Max. But if you think about the compute costs come down like two times every year, and I think the future will likely arrive like within few years.Vibhu [00:26:49]: It's everything, right? compute cost comes down, compute gets faster, model gets smarterEthan [00:26:54]: More efficientVibhu [00:26:54]: model gets smaller.Swyx [00:26:55]: I don't know why you say two times, ‘cause I think it's like 100 times. In language models, it is roughly one hundred to a thousand times every twelve to eighteen months, for the same given level of LMSys, ELO.Vibhu [00:27:08]: That's a net of everything, right? That's model performance alongside compute. So different than just compute costs come down. But, a very interesting future.Swyx [00:27:19]: So the web designers will have to shout out that accessibility is an issue, right? how do you deal with screen readers or whatever. But yes, this is higher bandwidth storytelling than anything you can possibly generate with code, right? So I think that's the rough idea.Ethan [00:27:34]: And I'd like to add a little bit that so human naturally have the maximum bandwidth when we are looking at things, look at videos, and we also have maximum output bandwidth when we are talking. So in the future, it might be something like we talk to AI models, and the AI model responds back with a generative UI. So that would be the maximum input and output bandwidth to interact with AI models before neural link happens.Vibhu [00:28:06]: And it's also very custom, right? Some people are very visual, some people are not as visual, right? They prefer the text. But the best thing about generative UI, right, it can also be text.Swyx [00:28:17]: There's another project that we wanted to highlight, which is the Neural OS. Kinda similar idea, but here you're literally operating, simulating an operating system with a video model.Swyx [00:28:27]: and you can play Doom, you can do Firefox. I find this like mildly less impressive, obviously, because it's an OS that I can run.Swyx [00:28:37]: But here everything is imagined.Vibhu [00:28:40]: I was, used to the Command+W to close the Firefox tab. It didn't crash. That's why I saidSwyx [00:28:45]: It's too immersive.Vibhu [00:28:46]: It's, it's too immersive for me.Swyx [00:28:47]: Too immersive.Vibhu [00:28:48]: I wanted to close the tab.Vibhu [00:28:49]: But yes, I can play generated diffusion.Swyx [00:28:51]: this is shockingly fast.Swyx [00:28:54]: Because I remember there was a demo about like maybe one to two years ago. Someone tried to do the first-person shooter with a image model. There was no consistency. It was very slow. But here it looks like realistically it's-- this is Doom.Vibhu [00:29:07]: I think there's two sides to that, right? There's okay, what is running a game? The heavy part of it is actually the game engine, all the lighting, all that stuff, the graphics. This is just kind of video, right? Like we've solved consistency. This is still, it looks like a few years old image generation. There's some temporal consistency, but it's, it's kind of just images stitched together as frame video. But it's a good visual representation to pi- to picture the future you wanna see, right? that's, that's what I see in these more so.Ethan [00:29:38]: This reminds me of how the video models gets better and better. So Neural OS is kinda if you just look at it feels like it's just a crappy version of the, like the Windows we could have, right? And, but the difference is, so the model, this model is overfitted on the existing operating systems. It can generate nothing different than that. But it's actually also similar to video models. So when we are training these video model, image model, we train them on internet. There's no imaginary supernatural stuff on the internet. But once we train this model, you can prompt the model to generate something supernatural that have never existed in the data set. So if you train your Neural OS or neural computer on the standard screen recordings on the entire internet. The model can imagine completely new interface to interact with the computer.Swyx [00:30:43]: This is one of those things that is magical to me. usually generalizing out of distribution is bad, but somehow we have learned some kind of internal world model that you say, this plus, but it looks like rainbows and butterflies, it'll do it and it will kind of make sense.Swyx [00:31:03]: So yeah, that's kind of cool. Yeah, I don't know if there's any comment more on there. I do, I do wanted to, I did wanted to touch a little bit more on the model architecture stuff, which I think you were getting. It's, really fascinating. We don't get a chance to talk about this enough. So one of the papers that we covered, we've covered every annual, segment anything release. and I don't know if you follow-- you're a computer vision guy, so youEthan [00:31:26]: I knowSwyx [00:31:27]: . So they did memory attention, which is kind of interesting. And I always think, anything where you can, across the temporal dimension, keep some consistency, I think it's, very fascinating, and I don't know if Basically, does that-- the CV side bleeding into video gen side, I think is underexplored, right? we talk about it for labeling, but actually you can borrow the architecture itself.Ethan [00:31:50]: There's, there's also complete different approaches, right? you brought up the term world model, so we went from video model to world model. There is diffusion, but there's also other approaches that people are doing. So maybe we get into those after as well,?Swyx [00:32:03]: He has a whole definition of world models and stuff. I feel like we threw a lot at you. Whatever you want to comment on.Why Video Models Are Expensive: Storage, I/O, and Training ScaleEthan [00:32:10]: I think one thing that we should actually comment back on is okay, so we were talking about the steps to train image gen to video model. One thing we don't see as much of is okay, you brought up the delta in training data, right? SoEthan [00:32:24]: you won't have as much a video model might not generalize, but what is the cost of training a large video model? So we know for LLMs roughly, okay, even like the poolside thing that came out today, right? It's a Gemma level model trained on roughly forty trillion tokens at this many H200s over this much time, right? You can see what is the exact cost of that. So how many GPU hours over how much H200 costs? So how do we do the back-end math of, same thing for video models, image models. How do you, how do you kind of break that down? I can share some back-envelope calculation. So surprisingly, video models is-- the cost is very-- is comparable to language models and obviously the largest scale is language model, maybe like a medium scale to language models. I said just storing the videos alone, it costs a lot. You can, you can maybe look up on AWS or something.Ethan [00:33:20]: You really, say if you have a billion videos and let's say, let's just say like each video, like five megabyte, then you need five petabyte to just store those videos. And also remember we talk about you use a VAE to compress the videos, and you also need to store, typically you need to store those continuous feature, in-- also in your storage. That's also comparable size with the videos themselves. So just storing these videos and the features is tens of petabytes alone. And,Swyx [00:33:58]: I just, I just looked up the calculation. Five petabytes on S3 Standard is one hundred K per month.Ethan [00:34:05]: AndSwyx [00:34:05]: It's comparableEthan [00:34:05]: and you needSwyx [00:34:06]: AndEthan [00:34:06]: And then like tens of petabytes, two hundred K. And even more expensive is you have the ingress and egress.Swyx [00:34:13]: Oh, yeah.Ethan [00:34:14]: Like you-- through the internet. You have to just to download those videos, I believe it's, it's more expensive on AWS than just storing those videos.Swyx [00:34:25]: Storing, yeah.Ethan [00:34:25]: And each training runs, you probably need to pull them once. If you train multiple times, it's, it's even more than that. So it's like just storing the network, those costs is just, it would be a few, a few millions per month to just storing everything, not to mention the GPU cost.Ethan [00:34:45]: AndSwyx [00:34:45]: my side tangent, the compute rental, like GPU rental is very efficient. There's one side, okay, you can be XAI and build your data center. Should we not just build our, storage compute as well? LikeEthan [00:34:57]: Of courseSwyx [00:34:57]: cloud cost compared to just,Ethan [00:34:59]: You save so muchSwyx [00:35:00]: store. Yeah, exactly.Swyx [00:35:01]: Especially with like egress and stuff. So.Ethan [00:35:04]: That's a good idea, but it also comes to-- there are some of its own challenges.Swyx [00:35:09]: Of course, of course.Ethan [00:35:10]: like people who build the GPU data centers, they might not expect this much, storage. And yeah, people build storage, typically they just build it somewhere with just CPUs.Swyx [00:35:23]: I just looked it up. Five-- AWS only charges for egress, not ingress. Tier five for five petabytes is two hundred and thirty K.Ethan [00:35:32]: Even more expensive than the storage.Swyx [00:35:34]: But storing is per month, right? You check in, then you cannot check out. so it's so cool. It's okay. So there's that side.Ethan [00:35:41]: So the TLDR, my backhand mathSwyx [00:35:42]: Data is larger than you think. Yes.Ethan [00:35:44]: my backhand math of GPU hours times GPU cost is also very much, I'm missing some storage.Swyx [00:35:49]: You're also-- you're basically like also more IO bound than normal training.Swyx [00:35:55]: Yes. ‘Cause like data loading, so caching everything, it becomes super important.Ethan [00:36:00]: So in Cosmos, we did a lot of optimizations to make it not IO bound. So, speaking of the training, actually training the model, the GPU cost, if you look up like the open source model, how big these video models are, I think like LTX has nineteen B parameters. That's a dense model. And people are also exploring, MoEs, so it might be twenty B active and, like a hun- hundreds B, total. So that's, that's even-- that's similar size as medium-sized LLM models. And if you, if you look at number of tokens-Uh, we disclose that in Cosmos. It's also like tens of trillions of tokens on the visual tokens. So putting this together, the cost of, training these video models, it's actually comparable with LLMs. Not to mention, the infra is slightly different from LLM, so it might be less efficient to train these models.Inference Speedups: Step Distillation, Consistency Models, and GANsSwyx [00:37:04]: Do you get the benefits of traditional diffusion speed-up? So for, images, there's LCM, LoRAs for, fine-tuning. There's, there's a lot of stuff that's beenEthan [00:37:15]: Flow matching.Swyx [00:37:16]: there's flow matching. There's a lot of stuff that's been done. there's some overlap that applies to diffusion on the inference side and stuff or?Ethan [00:37:23]: so the difference-- the inference side is a completely different story.Ethan [00:37:28]: I think for the training side, it might be a little bit hard to reduce that cost. And for the inference side, the biggest gain is from the distillation of these models. You can-- It's called step distillation, slightly different from knowledge distillation in LLMs. So you-- Typically, for flow matching models, you need like 100 steps or something. Like a distortion model even need even more, like 1,000 steps to generate a good image or video. A step distillation is try to learn to generate fewer step from the model itself. It's kind of like now we-- you use the full model to generate in 100 steps, and then you take a model that only generate 10 steps and let that model to learn from the perfect one.Ethan [00:38:25]: why this workSwyx [00:38:27]: Strong to weak seemingly.Ethan [00:38:28]: It is. It's kind ofSwyx [00:38:29]: DistillationEthan [00:38:29]: kind of like strong to weak. the-- from the modeling perspective, the strong model, the teacher model is trying to model the image and videos of inter-internet, and that distribution is extremely complex. But the step distilled model is just trying to learn from the teacher. The teacher is a model, and the size is fixed, as the distribution is much simpler than the whole internet. That's the intuition I have why step distillation can work. So usually these models serve in productions, they only run in a few steps. In Cosmos, I believe we have, we have like four step and eight steps. If you do some simpler task, image-image translation, it can even run in fewer step, like one step in Cosmos Transfer.Swyx [00:39:22]: I think this is the same intuition that guides a lot of the consistency model work. I sent you a link for, SCM. I don't know if you covered that. To me, that was actually one of, the most impressive papers I've ever seen from OpenAI.Swyx [00:39:34]: That this is the unifying grand concept of consistency models. I don't know if you have any comments on this.Ethan [00:39:41]: So there are, there are a few different approaches,Swyx [00:39:46]: Oh, yeah. Here it is.Swyx [00:39:47]: Two steps versus twenty or 100 steps, whatever. It's already done.Ethan [00:39:52]: So there are, there are a few different approaches, for example, consistency model, and there are also Actually, we shouldn't forget GAN. So GAN, actually, that was, that was the OG ofSwyx [00:40:05]: OGEthan [00:40:05]: step distillation ‘cause it trained just one step to begin with. So actually, a lot of, uh-- For example, there's a distribution matching distillation which use, which uses GAN, as one of the laws for distillation. It-- GAN just tells you, “Hey, generate an image,” and thenEthan [00:40:31]: it has a discriminator to tell, is this image real or not? So the model, the model just need to learn one of the distribution, not the full distribution. Because in training, the model is asked to reconstruct the ground truth image from the internet, which is extremely hard. And in-- When you're training GAN, it's a step process. It's just a, “Hey, you generate image. Does this image look as real as the image from the internet?” Which is a much simpler task. And, yeah, combining a lot of these approaches together, people typically do that, like consistency model and distribution matching and GAN, and we can get these few step models.Audio-Video Generation and Time AlignmentSwyx [00:41:21]: Then there's one step I wanted to add, which is audio and video.Ethan [00:41:26]: So, Grok Imagine zero point nine, I believe it's, it's a first audio video transmodel deployed at a large scale. SoSwyx [00:41:39]: And that was your first model?Ethan [00:41:40]: that was, Grok Imagine's first model. It's, it's audio video, joint generation. I think the hard part is, the modality alignment, ‘cause before this transmodel, we have, we have text to video alignment. We have this, correspondence between text and video. Typically, most of the VLMs, they understand images and videos. Video's very rare, and they don't understand audio mostly. And if you look at the audio generation on the LLM side, you can talk to them perfectly fine, but if you ask them to sing a song or something, it typically is not very good. Also, they don't have, they don't have music either. The hard part is thatUh, actually audio has two component. It has like a discrete component, a continuous component. The discrete component is like the language.Ethan [00:42:44]: So when we speak, it's just, someSwyx [00:42:47]: It's an ASR issue, yeah.Ethan [00:42:49]: It's, it's text token with some characteristics, I would say.Ethan [00:42:54]: But musicSwyx [00:42:56]: I think the speech guys would disagree with this.Swyx [00:42:57]: Like disfluencies and then,Vibhu [00:43:00]: There's tones you can get angry.Ethan [00:43:01]: Well, I say largely.Ethan [00:43:03]: the mu- but the music is completely different. It's, it's very continuous, and you cannot model them like discrete tokens in language models. this is like the hard part for models is, not to mention we have to align text, video, and audio together.Ethan [00:43:26]: SoVibhu [00:43:26]: How?Ethan [00:43:28]: So significant-- some significant challenges are like-- So first, like we talk about as the VLMs, they cannot understand most of them cannot understand audio.Ethan [00:43:39]: So you have to have some way to do the synthetic data generation for audio. You have to caption the model, and that involve, that involve synthetic data and human data effort a lot. And not just surprisingly, most of the LLMs are very bad at recognizing, like the beat, tone, and the details of the of music. They can, they can give some general prediction of which song is this, but it's very hard to describe the details of the music. like we mentioned in image generation, like you have to describe image as detailed as possible so that someone blind can reconstruct that. So here is like someoneVibhu [00:44:32]: DeafEthan [00:44:32]: someone deaf can reconstruct how the music sounds like without actually listening to it. Maybe you can think of it need to have the-- or they call the script.Vibhu [00:44:49]: Subtitles, yeah.Ethan [00:44:49]: You gotta have all the details of the music, and the dialogue.Vibhu [00:44:55]: So is the challenge there typically stuff like music and audio, or is it just Like is there a baseline? Okay, there's enough data where we can understand, narration, conversation, but there's nuances in audio that's where you hit all the data issues or is it just from stage zero, you just do it all right?Ethan [00:45:15]: So one important thing is like the alignment. So the model, the model has to know like the video and audio, the, uh-- it has to have a time-based alignment, like at which time step the video and the audio token correspond to each other. But we actually don't have this kind of alignment for most of the other modalities. If you think about like text and image, text and video, they are loosely aligned. So you can, you can have a description of what's going on in the video, but you don't have to exactly, You typically don't have exact description, oh, at, time step one second like what happened?Vibhu [00:46:02]: It's veryEthan [00:46:03]: At time step two second what happenedVibhu [00:46:03]: coarse. Yeah.Swyx [00:46:05]: So what was the ideal time step? You have to oblate it, and then it's like four seconds or something.Ethan [00:46:09]: So that comes down to how you design the model to, for the model to be aware of as a time, as a time modality. So the model is like a time aware. And that's something pretty unique if you think about LLMs. So if you ask LLM to complete a task, say they, uh-- you ask them and they will say, “Oh, this task will probably take twelve hours to complete,” and they come back in one hour. Say “I've already spent two days on this and I've exhausted everything.”Ethan [00:46:47]: So the LLMs them-themselves, they don't have a sense of time there.Vibhu [00:46:53]: I actually don't think that's just them not having a sense of time. I think it's somewhat based, right?Vibhu [00:46:58]: Like you tell someone, “Okay, go work on this feature. Go implement this,” there's a general understanding you would have of how long that would take without LLMs working at LLM speed, right? So you think back like two years ago, if I tell you to like build me like a new front end for latent space, have a search bar, have all this, you'll estimate that it'll take a few days, right?Vibhu [00:47:19]: So you tell an LLM, “Go build this.” It'll take me a few days. But I think it's somewhat grounded as opposed to them not having the best-- Not saying that they have a great understanding, but I think that example is like you can see where it comes from, right? You're trained on all over the text.Swyx [00:47:35]: They're, they're trying to estimate what a human would say.Vibhu [00:47:37]: because that's what the, that's what the data kind of represents. It's not themEthan [00:47:41]: It came from the corpus on the internet. People have a estimate of how much time.Vibhu [00:47:45]: And not even just in direct like training samples, right? Just your world understanding of tokens of how long stuff takes, right? Go read a book. It'll take you a while, right?Vibhu [00:47:56]: Even if you do nothing but read a book, it takes a few days. So yeah, LLM, I read it took me a few hours.Vibhu [00:48:01]: It'll take me a few hours to go through this research. But this is a tangent.Swyx [00:48:05]: Somewhat, yeah.Swyx [00:48:06]: This is a train of thought I haven't really expressed until now is, which is basically like a full world model must also be recursive, meaning that the participant in the world model must also be aware that they have a world model. which is like this whole recursive thing down the, down the line. but yes, and that the world model can be wrong and that they need to update it and blah. Yeah. We've, argued this on the, newsletter as well, that there needs to be sort of recursive or adversarial world models.World Models: Real-Time, Long-Horizon, Interactive VideoVibhu [00:48:34]: just, to ask, how do you define world model?Swyx [00:48:38]: Oh, yeah, let's go there.Ethan [00:48:40]: SoVibhu [00:48:40]: So just for context, we talked about, video generation, and then there's a-- if you say there's a distinction between world models, what's your, what's your definition? How do you see the two?Ethan [00:48:53]: So disclaimer, I'm not going to debate, what is world model. Yeah. there are many definitions, so I'll just talk about my definition. Since I came from the multi-model, multi-model domain, so mainly talking from video. So world model is like real-time interactive long horizon videos. So there are three parts. so we-- let's talk about them one by one. So the so interaction, so we just, we just look at Facebook and neural computer. So the interaction part of it, so you, world model can allow you to interact with them through keyboard, mouse, and maybe also voice. So these all is-- all is a modality. You can, you can interact with the model, and the model should respond reasonably. Second part is real time. So once you, once, say, you move your mouse, if, say, the world model generate a game, how fast can the game respond? So if you're like professional CS: GO players- -my say, oh, you have to respond- He's beginner within sub ten milliseconds or- Yeah even less. So that's not most of the- No, sixty FPS. Let's go. Oh, three hundred FPS. Oh, five hundred FPS. Wait. okay, yeah. I didn't do the math, but yeah, okay. Uh- Yeah, three hundred FPS, that's a three millisecond. So you have to respond- Oh, s**t. Okay. YeahEthan [00:50:29]: within a millisecond. Most of the video models cannot do that. Yeah. And, but if you, say, if you have a video model that is, say, like a digital human, the response time might be more generous. Maybe typically, for real-time voice interaction, it's like two hundred millisecond. So that's, that's much more generous. But even two hundred millisecond is pretty, it is pretty tricky, ‘cause remember we mentionedEthan [00:51:01]: you have this, temporal compression coming from the VAE. So if you, if you don't compress the temporal dimension, your sequence length is going to explode. So if you want to have this real-time, real-timeness in your model, you have to do is one context problem. And the third part is long horizon, ‘cause we-- if you're not going to just play with, video games just, a few seconds, most video models only a few seconds. We're going to play with minutes, hours. The model have to be able to generate long-form content.Ethan [00:51:42]: So putting these three together, it's, real-time, long horizon interactive videos. I think the final state will be, for example, like a video, a video version of Playbook, where you can, you can interact with, a neural computer. You move your mouse, and you click on the generative interface, and it will reply to you through pixels- generating in real time. But getting there, it's, it's a very long way to get there. So one of the first step, at Grok Imagine, where I led a small world model team there, was to build video extension. So, video extension- it's the first step of interactivity. Yeah. It's, it's the first step. Yeah. So it's the first step- You have it here, video editing, yeah. Yeah. Yeah. So the first step is because, this unlocks long horizon videos. Typically, for most of the video generation models, you give it a prompt or an image as an initial frame. You generate video, that's it. That's just, one time, done. And some creators would try to, use the last frame as a first frame for the second video. It can-- sometimes it works, but if you do it a few times, it says the quality would decrease. And- It doesn't have that context- Yeah over the full video, so the temporal- Yeah, exactly. Yeah, ‘cause you only gave it the last frame, of course, right? Yeah. Exactly. And- it's actually a pretty fun hack. if you've seen like- Oh, no, he's saying something better. Yeah. And for example, like Vue, I remember Vue 3 has like a second context of the last video. It is slightly better than using the last frame, but it has the same problem-- similar problem that it, the quality would decrease. if you extend a few times to, one minute, the video quality would look much worse than the first video. Second, another problem is that the model doesn't have long-range knowledge of, what's happening before. Say, if they generate some dialogue, some, two people speaking, and their voice might change, over some time, especially if the second conditioning, it does not cover the previous context. So these are the core challenges. So the Grok Imagine video extension, it has historical context of all of the previous generated videos. It can, It has, it has the context of, who is speaking and what objects have appeared and everything, having that to generate the next video. So if we naively do this, you can imagine, just, put all of the previous history video tokens into the context. The context lens will easily explode. Especially for video models, that can be like a few, a few million context, I would imagine- context lens. Yes.Yeah.Swyx [00:54:58]: Let's run with that.Ethan [00:54:59]: for example, like in Cosmos, I think just five seconds of video is like a fifty K or sixty K number of tokens. So like if you do, if you do fifty second, that's a five hundred K tokens. If you do longer than that, easily explode. This long horizon, problem was the first step we're trying to solve world model. It turns out people, yeah, people love video extension. Like a lot, a lot of the creators love using video extension to create longer form videos. This is the part I liked that you have a, you have an intermediate step toward the final goal instead of just a straight shot to the final version very much.Swyx [00:55:48]: But I can see you have a strong vision of where we want to end up.Long Context, Redundancy, and Efficient Interactive VideoVibhu [00:55:51]: Does it seem like it's an efficiency issue? okay, we're at a few million tokens context,. If you draw the parallel to language models, we had very short context, two thousand, eight thousand, then, you scale it up one million, ten million. sure, there's effective context, but at the end of the day, it's just what's it worth? sure, there's a whole training data side. In video, it might be slightly easier ‘cause we have a hundred million token video, right? Just take a movie with the full context there. Like is this efficiency from an inference standpoint that like it's expensive, but we know how to solve it? Or like why is this not the approach? So like my broader point was on your second point of world models, you say it needs to be interactive and live, right? You should be able to play a game and see the interaction live. So one thing I see with research is a lot of what you actually serve is different than what you build, right? So we talked about distillation. You train big model, you distill it, you do quantization, speculative decoding. We do all this stuff to serve it efficiently. Should we not just have a solution, like a world model that can interact well, do inference optimization, serve it, distill it secondary, so make it real time after you solve it? So like a-- another parallel is say, continual learning, right? What we need is someone to solve it and show it works inefficiently. Give it a few years, people will make it efficient. Same thing with regular attention, right? It worked. Over a few years, people have different forms of attention, and we've scaled it to be efficient at log context,? So kind of two things there, right? One is it seems like it works. You've scaled it. Can we not just scale it a lot more efficiently over time? Do we need a separate approach if this works? And same thing with interaction, right? if we can get it done, like if we can solve some way that it works, we can solve making it more efficient from an inference standpoint later.Ethan [00:57:53]: that's actually a very good point. So in videos, there's actually a lot of redundancies. So we solve a lot of the pixel redundancy from VE, but there's more redundancy in long range and long horizon videos. Say, if a character appear in the first clip and then it disappeared, it only reappear at the end of the video, you probably don't need the-- the context, like in the middle of the generation. So you only need that character, where you need. So that's why, I helped build another feature. It's a reference video.Vibhu [00:58:36]: Is it here?Swyx [00:58:36]: is it the same model release or different one?Ethan [00:58:39]: It's a different one.Ethan [00:58:41]: You probably need to search onSwyx [00:58:43]: I'll find itEthan [00:58:43]: X reference to video.Ethan [00:58:46]: So reference video allow you to like upload up to seven images as condition and generate the video. Say, if like I want-- it can, it can be characters or objects or even scenes. Say like I want, I want condition on, Sean's selfie and holding a bladeSwyx [00:59:07]: We have a dogEthan [00:59:08]: or whatever.Swyx [00:59:08]: We put the dog in the thing.Ethan [00:59:09]: you can put them there and the video models will generate the video from and copies the context over. So that can solve a lot of the problems there, like the long context problem. It doesn't need to have a very long context, but it's-- I feel like it's an intermediate solution. The modelSwyx [00:59:29]: It's cheating.Ethan [00:59:30]: the model should be able to like selectively know, where should I draw the references. So say if I want to generate a movie, I generate it autoregressive, like a ten second at a time or something. And now this character appear, I can look back to where it first appear and, bring that back. Yeah, this one, I put the references. Yeah, that's, Optimus, Einstein myself, Annie.Vibhu [01:00:02]: Oddly enough, I used Grok Search to find it, and it pulled your LinkedIn post. But yeah we found it.Ethan [01:00:08]: Interesting.Vibhu [01:00:10]: ButxAI's Underrated Work, Culture, and WatermarkingSwyx [01:00:11]: this is a problem. This is not your fault, but like XAI doesn't communicate all this work that you do very well because they just have the model release and then that's it. But actually, these details are very good.Swyx [01:00:22]: As far as I understand, everything you just described is state-art, like no one else has done it.Vibhu [01:00:30]: A lot of-- yeah, I have a lot moreSwyx [01:00:32]: And then, and then you just put this blog post with the cookies. I'm this is not enough,?Swyx [01:00:37]: but I, obviously this is like the high level numbers that people want to know. But no, okay, soVibhu [01:00:42]: And I wonder, like part of that is also some labs don't share research into what happens. And ifSwyx [01:00:50]: No, but this is literally bragging about how good they are, right?Swyx [01:00:54]: Like, why would you not say that you are capable of extending with full context? this is not a secret sauce. This is like we did the work. yeah, I don't know.Ethan [01:01:02]: different labs have slightly different communication styles.Swyx [01:01:07]: Anyway, if anyone from XAI is listening we are always happy to help you tell your story. Yeah, okay, so you did references, and I think, I think kind of the point you're, you're making is it is sort of like a kludge, right? this is-- you can do seven, but what about 100?Swyx [01:01:23]: Right? Then you need a completely different thing.Ethan [01:01:26]: So I think it's-- this is, a mechanism to, select the context from the history, and you might not put the entire history into the context. for example, there's a paper called Frame Pack, which haveEthan [01:01:41]: a heuristic that the latest history, the last one second, I put the entire history, and the history before that, I would, compress it and makes the video smaller. So they follow this pattern, this build overall pattern that the maximum sequence length is fixed. So the further you are from the current frame, you have a smaller image. So this is just a heuristic. I think it can be more automatic. The model is aware like which history part of it can be select. So this part of the research is actually being actively, worked on by a lot of people. It's also quite interesting. I feel this is actually, this part of long context is a little bit ahead of the LLM part.Ethan [01:02:31]: So for example, like in LLMs, if you-- so contexts keep growing. Let's say if you call tool and the tool call history is extremely long, that's still in context, and keep growing, keep growing. Even if you switch the topic to something else, the whole context was there. There are some agentic harnesses that help you to, say, prune the tool results and, prune Like when you, when you query a file, only show like the top 200 lines or something. Those were very heuristic-driven.Swyx [01:03:08]: For listeners, we did a write-up on the cloud code, leak where there are eight different kinds of pruning, including like you prune the tool results and all that. So you can, you can read up on that kind of thing.Ethan [01:03:17]: I think, one breakthrough in continual learning might be like a way to automatically, manage its own context.Swyx [01:03:27]: These are all heuristics, and they will be replaced by machine learning.Ethan [01:03:30]: InterestinglyVibhu [01:03:32]: TheEthan [01:03:32]: the same thing is being researched in both LLMs and video models.Vibhu [01:03:36]: The interesting thing is also like in the paper you showed, it's actually happening at the model level, right? Compared to like language models, sure, we have base attention, but we'll do our own compression, we'll do our own pruning, which is separate from model error.Vibhu [01:03:49]: Eventually, it all just boils in, hopefully.Swyx [01:03:52]: I think this is a form of like attention, but like also know sort of reasoning attention. I feel like that's different than normal attention.Swyx [01:04:03]: Does that, does that make sense?Ethan [01:04:04]: It's, it's different in the sense that attention, not to mention, set sparse attention aside,
QUOTES from the episode: "Utilities are no longer just incentive providers. They are becoming design partners." "High performance is not a luxury in affordable housing. It is a necessity for predictable living costs." "If you want this to scale, it starts with early coordination and shared goals." In this episode of RESTalk, we head to Tempe, Arizona, where extreme heat, rising energy costs, and housing needs are converging, and where a standout project is redefining what ENERGY STAR Multifamily New Construction can look like in a hot, dry climate. Bill is joined by Rebecca Smout of Salt River Project (SRP), Joe Keeper of Copa Health, and Jacob Alvarez of Desert Skies Energy to unpack the story behind La Victoria Commons, an ENERGY STAR–certified multifamily development designed with both performance and people in mind. At the center of the conversation is a powerful idea: utilities are no longer just incentive providers; they are becoming design partners. Through early collaboration, flexible program design, and deep involvement from HERS® raters, this project demonstrates how technical rigor and cost awareness can coexist. The team walks through how early modeling, integrated design, and verification helped align multiple programs and certifications while still delivering a practical, buildable project. But the real impact shows up at the resident level. As Joe explains, high-performance construction is not just about energy savings; it is about creating predictable, manageable utility costs for families in affordable housing. Combined with transit access, EV infrastructure, and community services, La Victoria Commons becomes more than a development. It becomes a replicable blueprint for housing equity and market transformation in some of the most demanding climates in the country. Rebecca's LinkedIn: https://www.linkedin.com/in/rebecca-smout-31a08644/ SRP: https://www.srpnet.com/ Jacob's LinkedIn: https://www.linkedin.com/in/jacob-alvarez125/ Desert Skies: https://desertskiesenergy.com/ Joe's LinkedIn: https://www.linkedin.com/in/joe-keeper-183aa13ab/ https://copahealth.org/ LaVictoria Commons in the news: https://www.tempe.gov/Home/Components/News/News/18749/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Send us Fan MailPaper Discussed in this Episode:Deep-learning-based breast cancer stage prediction from H&E-stained whole-slide images in resource-constrained settings. Bedőházi Z, Biricz A, Kilim O, et al. Journal of Pathology Informatics 21 (2026) 100644.Episode Summary:Welcome back, Trailblazers! In this Journal Club deep dive of the Digital Pathology Podcast, we flip the core assumption of microscopic precision on its head. Can an AI accurately predict pathological breast cancer stages (pTNM I-III) from a blurry, high-altitude 2.5x magnification snapshot? We explore a 2026 study that strips away standard high-resolution data to build a highly efficient, resource-aware AI diagnostic tool for clinics lacking supercomputers. We unpack the math, the models, and a haunting revelation about what primary tumors can tell us about distant metastasis.In This Episode, We Cover:• The Compute Bottleneck: Why the digital pathology AI revolution is leaving resource-constrained clinics behind, and how dropping from the standard 40x to 2.5x magnification slashes image patch extraction by 256 times, bypassing massive hardware and server requirements.• The "Airplane View": How the AI compensates for the loss of microscopic cellular details (like mitosis or cellular atypia) by relying on macroscopic features, identifying disease through overall tumor growth patterns and broad architectural disruption.• Vision Transformers & "Puzzle Bags": Why the UNI foundation model—a vision transformer fine-tuned on the BRACS dataset—outperforms older convolutional networks (like ResNet-50) by mapping long-range spatial dependencies across the entire image patch simultaneously. Plus, how Multiple Instance Learning (MIL) acts as a targeted "puzzle bag," mathematically weighting critical cancer data and ignoring irrelevant background noise.• The Real-World Stress Test: The model's solid performance on the internal Semmelweis dataset versus the massive external Nightingale cohort, where unsupervised data cleaning with t-SNE and DBSCAN clustering automatically deleted garbage data. We also discuss the AI's struggle with the TCGA-BRCA dataset due to severe domain shift from heterogeneous tissue preparation, specifically the structural tissue damage caused by frozen sections.• The "Messy Middle" and Clinical Triage: The model's tendency to struggle with Stage II breast cancer and the critical clinical danger of under-staging advanced Stage III cancers. We discuss why this WSI-only baseline isn't replacing human pathologists, but rather serves as an automated "sorting hat" for incomplete medical records or a highly tunable "smoke detector" to route suspicious slides for immediate manual review.Key Takeaway:The AI successfully predicted overall cancer stage—which inherently includes distant lymph node metastasis—by looking only at the primary tumor's architectural disruption, without ever evaluating a single lymph node slide. This proves that vital systemic biological secrets are hiding in plain sight in the macroscopic view of standard H&E slides, offering a phenomenal proof-of-concept for global health equity in resource-constrained settingsSupport the showGet the "Digital Pathology 101" FREE E-book and join us!
QUOTES from the episode: "You can't fix what you didn't catch early. Thoroughness upfront saves everything downstream." "It's a young industry. That's what makes it exciting, we're still shaping what it becomes." "If you don't have a reason for being in this work, it's going to be hard to stay in it." In this episode of RESTalk, Bill sits down with 2026 Emerging Leader Fellows Courtney Holland and Hayden Cantoni fresh off their experience at the RESNET® Conference. Both guests share how they found their way into the HERS® and building performance industry, often unexpectedly, and what has kept them engaged ever since. From fieldwork to modeling, their stories highlight a career path that blends purpose, technical skill, and variety. Courtney and Hayden reflect on what accelerates learning in the field, the importance of mentorship, and the real-world challenges of tasks such as ventilation verification and HVAC grading. They also offer honest insights into early mistakes, emphasizing the importance of thorough inspections, documentation, and speaking up when something is off. Their perspectives reveal a maturing industry that still has room to improve in training pathways and awareness. Looking ahead, both see a future where HERS® ratings become standard practice in homebuilding, not a premium add-on. They make a compelling case for why new talent is critical to that future, bringing fresh ideas and energy to a still-evolving field. Whether motivated by sustainability, health, or hands-on work, their message is clear: this is an industry worth paying attention to. Hayden's LinkedIn: https://www.linkedin.com/in/haydencantoni/ Courtney's bio: https://southern-energy.com/celebrating-women-in-construction/#courtney-foulis RESNET's® announcement: 2026 Emerging Leaders Fellow: https://www.resnet.us/articles/resnet-announces-the-2026-emerging-leader-fellows/ To the RESNET® community, we hear you and want to engage with you. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Explore Oracle AI Vector Search and learn how to find data by meaning, not just keywords, using powerful vector embeddings within Oracle Database 23ai. In this episode, hosts Lois Houston and Nikita Abraham, along with Senior Principal APEX & Apps Dev Instructor Brent Dayley, break down how similarity search works, the new VECTOR data type, and practical steps for implementing secure, AI-powered search across both structured and unstructured data. Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-ai-vector-search-fundamentals/140188/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, Anna Hulkower, Kris-Ann Nansen, and the OU Studio Team for helping us create this episode. ---------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Communications and Adoption Programs with Customer Success Services, and with me is Nikita Abraham, Team Lead: Editorial Services with Oracle University. Nikita: Hi everyone! Today, we're beginning a brand-new season, this time on Oracle AI Vector Search. Whether you're new to vector searches or you've already been experimenting with AI and data, this episode will help you understand why Oracle's approach is such a game-changer. Lois: To make sure we're all starting from the same place, here's a quick overview. Oracle AI Vector Search lets you go beyond traditional database searches. Not only can you find data based on specific attribute values or keywords, but you can also search by meaning, using the semantics of your data, which opens up a whole new world of possibilities. 01:20 Nikita: That's right, Lois. And guiding us through this episode is Senior Principal APEX & Apps Dev Instructor Brent Dayley. Hi Brent! What's unique about Oracle's approach to vector search? What are the big benefits? Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems. It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data. 02:41 Lois: OK, and can you explain what the new VECTOR data type is? Brent: So, this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data. Now, the vector data type allows a foundation to store vector embeddings. This allows you to store your business data in the database alongside your unstructured data, and allows you to use those in your queries. So it allows you to apply semantic queries on business data. 03:24 Lois: For many of our listeners, "vector embeddings" might be a new term. Can you explain what vector embeddings are? Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data. You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs. Embeddings can be used to represent almost any type of data, including text, audio, or visual such as pictures. And they are used in proximity searches. 04:19 Nikita: Now, searching with these embeddings isn't about looking for exact matches like traditional search, right? This is more about meaning and similarity, even when the words or images differ? Brent, how does similarity search work in this context? Brent: So vector data is usually unevenly distributed and clustered. Vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space. What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance. Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters. 05:51 Lois: Let's talk about how we actually convert information into these vectors. There are models behind the scenes, right? Kind of like translators between words, images, and numbers. Brent, what embedding models does Oracle support, and how do they handle different data types? Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. What that actually means? It allows you to quantify features or dimensions. Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open-source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings. For visual data, you can use residual network, also known as ResNet, to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors. As an example, Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions. 07:45 Nikita: For organizations ready to put this into practice, there's the question of how to get the models up and running inside Oracle Database. Can you walk us through how these models are brought into Oracle Database? Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open-source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as onn-ex. Oracle Database implements an ONNX runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL. 08:41 AI is transforming every industry. So, it's no wonder that AI skills are the most sought-after by employers. If you're ready to dive into AI, check out the OCI AI Foundations training and certification that's available for free! It's the perfect starting point to build your AI knowledge. Head over to mylearn.oracle.com to kickstart your AI journey today! 09:06 Nikita: Welcome back! Let's make this practical. Imagine I'm setting this up for the first time. What are the big steps? Can you walk us through the end-to-end workflow using Oracle AI Vector Search? Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So, what does this long sentence mean, for instance? What are the main keywords out of it? You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths. Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search. Now you have to generate vector embeddings from your unstructured data, as we mentioned, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs. You can import pretrained models in ONNX format for vector generation within the database. You can download pretrained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database. Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies. 11:36 Nikita: Once those embeddings are generated, what's the next step? Brent: Store vector embeddings. So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships. You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database. 12:17 Lois: And when do vector indexes come into play? Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces. 12:38 Nikita: Now, once all of that is in place, how do users perform similarity searches? Brent: So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data. So let's take a look at the combined complete workflow. Step number one, generate the vector embeddings from your unstructured data. Step number two, store the vector embeddings. Step number three, create vector indexes. And step number four, combine similarity and keyword searches. Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline. 14:07 Lois: Thanks for that detailed walk-through, Brent. To sum up, today we introduced Oracle AI Vector Search, discussed its core concepts, data types, embedding models, and the complete workflow you'll use to get real value out of your business data, securely and efficiently. Nikita: If you want to learn more about the topics we discussed today, go to mylearn.oracle.com and search for the Oracle AI Vector Search Fundamentals course. And if you're feeling inspired to try this out for yourself, don't forget to check out the Oracle Database 23ai SQL Workshop for hands-on training. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 14:49 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
The Lunar Society: Read the notes at at podcastnotes.org. Don't forget to subscribe for free to our newsletter, the top 10 ideas of the week, every Monday --------- We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion.People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops.But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long.During this time, what we know today as the better theory can actually make worse predictions.And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don't even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy!Watch on YouTube; read the transcript.Sponsors- Jane Street loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Street's ResNet challenge and posted a great walk-through on X. If you want to try one of these puzzles yourself, there's one live now at janestreet.com/dwarkesh.- Labelbox can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train how it thinks, not just what it thinks. Whatever you're focused on—math, physics, finance, psychology or something else—Labelbox can help. Learn more at labelbox.com/dwarkesh.- Mercury just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. It's a super low-friction way to stay on top of your business. Learn more at mercury.com/insights.Timestamps(00:00:00) – Kepler was a high temperature LLM(00:11:44) – How would we know if there's a new unifying concept within heaps of AI slop?(00:26:10) – The deductive overhang(00:30:31) – Selection bias in reported AI discoveries(00:46:43) – AI makes papers richer and broader, but not deeper(00:53:00) – If AI solves a problem, can humans get understanding out of it?(00:59:20) – We need a semi-formal language for the way that scientists actually talk to each other(01:09:48) – How Terry uses his time(01:17:05) – Human-AI hybrids will dominate math for a lot longer Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
We begin the episode with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion.People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops.But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long.During this time, what we know today as the better theory can actually make worse predictions.And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don't even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy!Watch on YouTube; read the transcript.Sponsors- Jane Street loves challenging my audience with different creative puzzles. One of my listeners, Shawn, solved Jane Street's ResNet challenge and posted a great walk-through on X. If you want to try one of these puzzles yourself, there's one live now at janestreet.com/dwarkesh.- Labelbox can get you rubric-based evals, no matter your domain. These rubrics allow you to give your model feedback on all the dimensions you care about, so you can train how it thinks, not just what it thinks. Whatever you're focused on—math, physics, finance, psychology or something else—Labelbox can help. Learn more at labelbox.com/dwarkesh.- Mercury just released a new feature called Insights. Insights summarizes your money in and out, showing you your biggest transactions and calling out anything worth paying attention to. It's a super low-friction way to stay on top of your business. Learn more at mercury.com/insights.Timestamps(00:00:00) – Kepler was a high temperature LLM(00:11:44) – How would we know if there's a new unifying concept within heaps of AI slop?(00:26:10) – The deductive overhang(00:30:31) – Selection bias in reported AI discoveries(00:46:43) – AI makes papers richer and broader, but not deeper(00:53:00) – If AI solves a problem, can humans get understanding out of it?(00:59:20) – We need a semi-formal language for the way that scientists actually talk to each other(01:09:48) – How Terry uses his time(01:17:05) – Human-AI hybrids will dominate math for a lot longer Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
QUOTES from the episode: "You absolutely belong here. There are many paths into this industry, and there's room for everyone to thrive." - Cindy Zeis "Patience, facts, and science always win the day." - Sharla Riead "The future of women in building science is unlimited." - Nancy St. Hilaire In this milestone 150th episode of RESTalk, host Bill Spohn marks Women's History Month by highlighting the voices of women from across the RESNET® community. Through a series of short conversations, industry leaders share how they entered the world of building science, the mentors who influenced their careers, and how the industry has evolved over time. Their stories reveal that there is no single path into the home performance and HERS® ecosystem. Some entered through construction, the mechanical trades, and hands-on field work, while others came through sustainability planning, academic programs, or early energy-auditing efforts dating back to energy crises of past decades. What they share is a passion for solving complex problems in buildings and translating technical building science into practical solutions that improve comfort, efficiency, durability, and affordability. The conversation also explores the experience of working in a traditionally male-dominated industry. Guests reflect on early moments of needing to prove their credibility, walking onto job sites where they were the only woman in the room, and overcoming stereotypes about fieldwork. At the same time, they highlight the strong mentorship, collaboration, and respect that characterize the RESNET® network. Many have seen meaningful progress over the years, with more women becoming HERS raters, modelers, quality assurance professionals, and industry leaders. Looking ahead, the guests point to one of the biggest remaining barriers: visibility. Many people simply do not realize that building science exists as a career path. Their advice to the next generation is to stay curious, seek hands-on experience, build strong communication skills, and ground their work in solid science and ethics. As building performance becomes increasingly interdisciplinary, blending field diagnostics, data, standards, and software, the opportunities for women to influence and lead the industry are expanding rapidly. Their closing reflections capture the spirit of the episode: the future of women in building science is impactful, influential, and ultimately unlimited. Guests this episode: Cindy Zeis, Technical Director with PSD https://www.linkedin.com/in/cindyzeis/ Julie Christesen, Chief Administrative Officer/Building Scientist with ES Green & Co. https://www.linkedin.com/in/julie-conn-christesen/ Nancy St. Hilaire, Co-owner of Builder i Group https://www.linkedin.com/in/nancy-st-hilaire-59161121/ Laurel Elam, Sr. Director Business and Standards Development for RESNET https://www.linkedin.com/in/laurel-elam-5404817/ Sharla Riead, Lead Instructor at the Energy Smart Institute https://www.linkedin.com/in/sharla-riead/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Paulette McGhie, programs manager at ResNet, and Michel Matthews, program engagement specialist, discuss how ResNet standards differ from other energy efficiency programs and how those standards are developed and updated. They explain the important role ResNet standards play in the mortgage industry, including how a ResNet‑certified home may qualify buyers for special mortgage financing tied to new home purchases or upgrades. Michel highlights how the standards ensure accuracy and consistency in home energy ratings, building confidence for lenders and homeowners alike. The conversation also addresses what happens after certification, with Paulette noting that ResNet currently does not enforce ongoing compliance when a certified home is later sold. Podcast Recorded on February 12, 2026
QUOTES from the episode: "The RESNET® Conference is where experience meets fresh perspective, and that's where the real progress happens." "This isn't just another conference. It's about shaping the next era of the HERS© industry." "If you want to understand where the industry is headed, this is where the conversations are happening." In this episode of RESTalk, Bill Spohn sits down with Noah Kibbe from RESNET® to preview the 2026 RESNET® Conference coming to San Antonio's Riverwalk. Noah shares what goes into planning the event, from selecting the location and shaping the agenda to locking in what might be one of the most memorable receptions in conference history. Yes, at the Alamo. The conversation gives listeners a behind-the-scenes look at how the conference is intentionally designed to balance technical depth, networking, and community. Noah walks through the structure of the conference, including general sessions, expanded 90-minute breakout sessions, and clearly defined tracks like energy codes, building science, carbon reporting, RESNET® Quality Assurance, and RESNET® 101 for newcomers. The discussion highlights why RESNET® extended session lengths this year, putting more emphasis on Q&A, panel-style discussions, and practical takeaways that attendees can actually use when they return home. The episode also covers sponsorship opportunities, pre-conference training options, the Whova app for planning and networking, and new additions like session recordings available after the event. Noah closes by framing the conference theme, "The Next Era of the HERS® Industry," and sharing details about RESNET®'s Habitat for Humanity build day, reinforcing that this event is not just about education, but also impact and giving back. Conference link: https://www.resnet.us/2026-conference/ Sponsor Application: https://www.resnet.us/2026-resnet-conference-sponsor-application/ Learn more on the Habitat for Humanity Build day: email: Valerie@RESNET.us Noah's LinkedIn: https://www.linkedin.com/in/noahkibbe/ WHOVA event scheudle: https://whova.com/embedded/event/lC-B89PmXiRbKBwCelSSr2arOUnE88wFAawUiW8ufLc%3D/?refer=undefined&day=1 To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
QUOTES from the episode: "What raters do makes the difference. Supporting them protects the credibility of the entire ecosystem." "Energy and water performance are not abstract. They are measurable, verifiable, and they matter to families." "My role is to protect what works, listen carefully, and help RESNET adapt without losing what makes it special." In the first RESTalk episode of 2026, Bill Spohn welcomes Shelby Gatlin, the new Executive Director of RESNET©, for a thoughtful conversation about leadership, credibility, and the future of home energy ratings. Shelby shares her professional path from conservation-focused work and regulatory advocacy into the residential energy efficiency space, including years spent navigating operational growth and industry change. Shelby explains why RESNET©'s voluntary standards, third-party verification, and ecosystem of trust matter now more than ever, especially as affordability, comfort, and long-term household costs take center stage. She emphasizes that energy and water performance are not abstract concepts, but measurable outcomes that directly impact homeowners. Throughout the discussion, Shelby underscores the critical role of raters and providers in protecting the integrity and credibility of RESNET© programs. Looking ahead, Shelby outlines her leadership approach: build on what already works, listen closely to stakeholders, modernize systems where clarity and transparency are needed, and preserve what makes RESNET© unique. She also shares her excitement for her first RESNET© conference as Executive Director and her commitment to empathy, collaboration, and trust across the industry. Shleby's LinkedIn: https://www.linkedin.com/in/shelby-gatlin-59100926/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
In this special "Best of RESTALK 2025" episode, host Bill Spohn looks back at the ideas and conversations that defined the year across the RESNET® ecosystem. Rather than a simple highlight reel, this episode weaves four major themes that kept surfacing—from sweeping QA modernization efforts to the market forces accelerating HERS adoption to the emerging leaders reshaping the industry to the on-the-ground projects proving that high performance and affordability can coexist. Across conversations with QA directors, software leaders, conference fellows, builders, lenders, and RESNET® founder Steve Baden, 2025 revealed an industry that's both maturing and reinventing itself. The year's strongest clips capture this mix of evolution and momentum: real-time QA replacing annual reports, financing that rewards low-carbon construction, programs nurturing new talent, and innovative housing models—from 3D-printed concrete to all-electric Habitat townhomes—delivering measurable results for families. Whether you're a rater, builder, program manager, lender, policymaker, or simply someone who cares about the future of housing, this episode connects the dots on where RESNET® is headed—and why the next few years will be pivotal for quality, credibility, and impact. Major Themes Covered 1. QA as the Backbone: Apps, Automation & Trust Episodes: EP137, EP140, EP144, EP146 How RESNET® is modernizing QA with real-time data, tech upgrades, stronger staffing, and clearer processes—protecting the credibility of HERS® as volumes scale. 2. Market Signals, Money & Scaling HERS® Episodes: EP136, EP138, EP142, EP145, EP146 Why HERS® grows fastest where incentives and financing tools align—and how products like C-PACE and tax credits are turning performance into business value. 3. The People Pipeline: Conferences, Emerging Leaders & New Roles Episodes: EP136, EP137, EP138, EP146 The next generation of raters, analysts, and QA specialists entering the field, and how programs like Emerging Leaders keep the talent pipeline strong. 4. High-Performance, Affordable & Innovative Housing in Practice Episodes: EP139, EP141, EP143, EP145, EP146 Real projects—3D-printed homes, ERV best practices, net-zero-ready Habitat homes—showing that innovation and affordability can move together. Featured Guests (Across 2025 Episodes) Scott Doyle – Managing Director of QA, RESNET® Laurel Elam – Senior Director, RESNET® Steve Baden – Executive Director & RESNET® Founder Tricia Baker – Senior VP of Strategy & Impact, PACE Equity Carl Pier – In-house Engineer & HERS® Modeler, PACE Equity Emerging Leaders & Fellows – RESNET® ELC participants Raters, Builders & Habitat Teams from EP139, EP143, and others To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET® at INFO@RESNET.US
QUOTES FROM STEVE: "The challenge isn't how far we can push efficiency—but how affordable we can make it." "Give builders a clear target and the market will respond." "It's been a hell of a ride—thanks for the memories. Now it's your turn to take it to the next level." Steve Baden tells the story of RESNET®'s early days—from Alaska's energy-code back-and-forth to a market-based framework that encouraged builders with targets rather than rules. That experiment birthed home energy ratings, a national standard, and eventually RESNET® itself—formed with the Mortgage Bankers Association and state energy offices to create a uniform "MPG for homes." The payoff: 4.8 million HERS®-rated homes to date and 420k+ in the last year, with large production builders using the RESNET® HERS® score to compete on quality in down markets. Baden credits stepwise, consensus standards (HERS → ANSI©/RESNET® 301), rigorous QA, and healthy software competition for scaling with credibility. He highlights investments in the registry, virtual QA, and an "emerging leaders" pipeline to keep talent and ideas flowing. Looking ahead, he frames RESNET®'s million-ratings-per-year goal by 2028 against today's affordability crunch: performance gains must be paired with cost sensitivity, or we lock people out of housing. As Baden steps into a new chapter, he leaves a simple message behind: value the people, keep the partnerships strong, and keep RESNET moving forward. Steve's LinkedIn: https://www.linkedin.com/in/steve-baden-54080212/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Patrick Nielsen is the Global Technical Products Manager for Broan Nutone, a leading manufacturer of residential ventilation solutions. Patrick has 20 years of experience with Broan including roles in product development, where he used his engineering background to launch a wide range of innovative ventilation products to solve a variety of IAQ and customer challenges. More recently his focus has been codes and standards, projecting longer-term product needs/opportunities and training. He serves as chair on the Home Ventilating Institute (HVI) Board of Directors, chairs their Codes and Standards committee and is active in multiple other committees addressing industry issues, including ASHRAE, RESNET, EEBA and IAQA. Patrick Nielsen on LinkedIn BROAN NUTONE
Ryan Meres, Paulette McGhie, and Michael Matthews from RESNET explore the significance of the HERS H2O rating system, which evaluates a home's water efficiency alongside its energy performance. They explain the types of homes eligible for certification and share statistics on adoption in California and other states. The discussion highlights how the HERS H2O rating aligns with the EPA's WaterSense label, offering builders and homeowners a pathway to formal recognition and incentives. The guests also detail the inspection criteria used in certification and emphasize the value of water-efficient homes—even in regions not currently facing water scarcity. This episode underscores the growing importance of integrating water conservation into residential construction standards. Podcast Recorded on October 30, 2025
optimización inspirada en la naturaleza.Un grupo de investigadores desarrolló el Spotted Hyena Optimizer (SHO), un algoritmo bioinspirado que replica las estrategias de caza de las hienas para mejorar la clasificación automática de imágenes de alimentos.Combinado con redes neuronales profundas (Xception + ELM), el modelo logró una precisión del 85.9% en la identificación de platillos, superando a arquitecturas clásicas como ResNet o Inception.
QUOTES: “We created the CIRRUS Low Carbon program to speak to developers' pocketbooks — build a better building, and you get a better rate.” — Tricia Baker “Early design is peace of mind — it saves you a buck or two and a lot of headaches later.” — Karl Peer In this episode of RESTalk, host Bill Spohn is joined by Tricia Baker, Senior Vice President of Strategy and Impact at PACE Equity, and Karl Peer, an in-house engineer with deep experience in HERS ratings and energy modeling. Together, they unpack the power of C-PACE (Commercial Property Assessed Clean Energy) financing — a state-enabled program that helps developers and building owners access long-term, low-cost private capital for sustainable building upgrades and construction. Tricia introduces CIRRUS Low Carbon, a unique PACE Equity program that rewards better-performing buildings with lower interest rates. Buildings that meet a HERS score of 55 or better automatically qualify for discounted financing, offering developers meaningful savings — often 50 basis points lower than standard rates. She emphasizes how this program directly links building performance to financial incentive, creating a clear economic reason to pursue higher-efficiency, lower-carbon projects. Karl brings the technical depth, explaining how in-house engineering and RESNET© collaboration ensure rigorous validation. He describes case studies ranging from multifamily buildings in California to resilient coastal redevelopment in Florida, where C-PACE funding supported both energy efficiency and storm-hardening measures. Together, Tricia and Karl illustrate how early engagement in the design process yields the best outcomes — “early design is peace of mind,” Karl says — showing how this innovative financing mechanism can drive sustainable change while improving property value and financial performance. LinkedIn: https://www.linkedin.com/in/tricia-kuse-marketing/ For our guests https://www.linkedin.com/in/karl-peer-54878b86/ Pace-Equity Main: https://www.pace-equity.com/ Cirrus C-Pace: https://www.pace-equity.com/lowcarbon/ Case studies: https://www.pace-equity.com/case-studies/ CONTACT: https://www.pace-equity.com/schedule-a-meeting/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
“If you're a professional and the rules of your profession change, you can't afford not to keep up.” — Scott Doyle In this episode of the RESTalk podcast, host Bill Spohn sits down with two long-time leaders at RESNET
Ryan Meres Came to RESNET with decades of experience in energy policy, energy codes, and home energy performance after working for groups like the Institute for Market Transformation, the U.S. Department of Energy, the Georgia Department of Community Affairs, and Southface Energy Institute. He continues to focus on policy initiatives as a program specialist for RESENT but also spends much of his time understanding the impact that the RESNET's National Building Registry data base can have on the building industry in the United States.The RESNET National Building Registry is a database that houses most of the information gathered by certified HERS Raters that create confirmed HERS ERI scores on homes across the country. Data from over 4 million HERS Rated homes is now in the registry database. The Registry serves as a central hub of information for the HERS Rating industry providing data essential for the quality assurance process, but also for policymakers, industry and researchers, as well as helping real estate agents, appraisers, and the public verify the energy-efficiency of a homes through a disclosure of if the home has been rated and what the HERS ERI score is.Ryan Meres on LinkedInRESNET Ratings Mid-Year ReportRESNET® Releases 2024 Annual ReportPublic Access to RESNET National RegistryRESNET Newsletter SignupInterested in more Access to RESNET National RegistryRESNET Registry Data Sharing policy
“When you're in construction, you're just building sticks and bricks—until you understand how a house actually performs.” – Kevin Smith In this episode of RESTalk, Bill sits down with Kevin L. Smith, CEO of Habitat for Humanity of New Castle County, to explore a groundbreaking project: the Bennett Point Phase 1 homes, which earned both ENERGY STAR and DOE Zero Energy Ready Home certifications. Kevin shares how his team collaborated with Energize Delaware and New Ecology to build high-performance, all-electric townhomes aimed at long-term affordability and health for first-time homebuyers. From redesigning floor plans to accommodate advanced HVAC systems to navigating volunteer labor and contractor training, Kevin explains the behind-the-scenes coordination needed to hit ultra-low RESNET® HERS® scores of 33–35—among the best in Delaware without solar. The conversation wraps with reflections on lessons learned, surprises in the process, and guidance for other Habitat affiliates looking to level up their housing mission. Other quotes from our conversation: “We want homeowners to walk into their house confident they can maintain it, save money, and breathe easier—literally and financially.” “A tight house with smart systems isn't just efficient. It's transformational.” Kevin's Bio: https://www.habitatncc.org/hfhncc-staff/kevin-l-smith/ His LinkedIn: https://www.linkedin.com/in/kevin-l-smith-18360314/ New Ecology: https://www.newecology.org/ Other content on this project: https://www.facebook.com/HabitatNCC/posts/join-us-for-the-bennett-point-home-dedicationhabitat-for-humanity-of-new-castle-/1061661725998026/ https://www.habitatncc.org/news/2025/03/video-moving-in-homes-at-habitats-bennett-point-ready-for-occupancy/ https://www.delawareonline.com/story/opinion/2019/12/03/new-habitat-development-key-revitalizing-eastside/2588463001/ Listen to RESTalk Episode 139 where we meet with Jason Barlow President & CEO of Habitat for Humanity Central Arizona to talk about: Concrete Innovations: The Story of Arizona's First 3D-Printed Home: https://restalk.libsyn.com/ep139-concrete-innovations-the-story-of-arizonas-first-3d-printed-home-with-jason-barlow To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
“RESNET members are now touching about a third of all new single-family homes. That's a big number — and it's only growing.” -Ryan Meres In this episode of RESTalk, host Bill Spohn welcomes back Ryan Meres, Program Director at RESNET®, to explore the key insights from the 2025 HERS Rated Homes Data Trends Report. This sixth annual edition reveals both expected and surprising patterns in how homebuilders, energy raters, and policymakers are driving efficiency across the U.S. housing market. With over 436,000 homes HERS rated in the past year, the report reflects RESNET®'s growing impact — now touching a third of all new single-family homes and a rising share of multifamily projects. Ryan breaks down regional standouts: Massachusetts leads with 88% of new homes HERS rated due to its Green Communities program, while Texas tops the volume charts with 113,000 ratings. Notably, California showed the most improvement in HERS scores since 2013, driven in part by its solar mandate. Meanwhile, the report tracks construction practices, including the dominance of slab-on-grade foundations and the increasing prevalence of ductwork in conditioned spaces. Other notable topics include growing adoption of heat pumps (especially in the Southeast), a dramatic increase in homes achieving
"As homes get tighter, ventilation isn't optional—it's essential." — Patrick Nielsen In this episode of RESTalk, host Bill Spohn sits down with Patrick Nielsen, Global Technical Products Manager at Broan-NuTone, to discuss the evolving world of home ventilation. With over two decades of experience, Patrick walks us through the major drivers shaping residential ventilation practices today—from tighter building envelopes to the growing emphasis on indoor air quality and energy efficiency. Patrick explains the importance of dwelling unit ventilation (DUV), spot vs. whole-house strategies, and the increasing role of energy recovery ventilators (ERVs) as builders and raters seek practical ways to meet stringent codes and achieve better HERS scores. He also touches on upcoming developments like capture efficiency standards for range hoods, automated balancing features in modern ERVs, and the implications of makeup air requirements. Whether you're a builder, rater, or HVAC pro, this episode is packed with actionable insights for navigating the ventilation challenges in today's high-performance homes. Patrick's LinkedIn: https://www.linkedin.com/in/patrick-nielsen-48b93a3/ Broan's website: https://broan-nutone.com/en-us Broan Specficier tools (A variety of code-compliant specification tools for project applications.): https://broan-nutone.com/en-us/home/specifier-tools The Home Ventilating Institute (HVI): www.hvi.org Links to ASHRAE 62.1 and 62.2 standards: https://www.ashrae.org/technical-resources/bookstore/standards-62-1-62-2 One of many sites to find building codes by state: https://ibhs.org/public-policy/building-codes-by-state/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
"You can't manage what you can't measure." — Peter Drucker In this insightful episode of RESTalk, host Bill Spohn welcomes Scott Doyle, RESNET's Managing Director of Quality Assurance (QA), for a comprehensive update on what's ahead in the world of RESNET QA®. Scott unpacks the most significant changes hitting the registry, Chapter 9 standards, and the QA app—all designed to modernize, streamline, and strengthen the QA process. The conversation delves into how these updates will affect HERS® raters and providers, with a major focus on the ENERGY STAR QA/QC program. Scott outlines the move toward increased documentation, real-time oversight, and the eventual integration of automation and AI into RESNET's workflow. With the industry's expectations for speed and accuracy climbing, these changes aim to ensure trust, defensibility, and better service. He also gives a call to action: start adapting now by reviewing ENERGY STAR Field Checklist Revision 14. Whether you're a provider, QAD, or rater, this episode equips you with both the “why” and the “what's next” behind the QA evolution—and how to stay ahead of the curve. Note: This episode was recorded on May 1, prior to the recent speculation that the Trump administration is planning to eliminate the EPA and ENERGY STAR. RESNET has been active in advocating for the preservation of the 45L tax credit and ENERGY STAR Homes and will continue to provide updates as more information becomes available. Scott's LinkedIn: https://www.linkedin.com/in/scott-doyle-84750823/ Link to RESTalk Episode 134: Boosting Efficiency: How RESNET's QA App Is Transforming the Industry https://directory.libsyn.com/episode/index/id/33586437 Energy Star National Field Rater Checklist, Rev 14: https://www.energystar.gov/sites/default/files/2025-01/National%20Rater%20Field%20Checklist_Rev%2014.pdf RESNET QA Compliance Specialist Job Postings: https://www.resnet.us/articles/job-posting-resnet-qa-compliance-specialists-regional-positions/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
"Innovation is taking two things that exist and putting them together in a new way." — Tom Freston In this episode of the RESTalk podcast, Bill Spohn sits down with Jason Barlow, former President & CEO of Habitat for Humanity Central Arizona, to discuss a groundbreaking achievement in homebuilding—the first 3D-printed concrete home in Arizona. Jason shares how this innovative project, born from a collaboration with ASU graduates and a German scaffolding company, PERI, pushed the boundaries of affordable housing construction. From overcoming extreme heat challenges to designing a home with unique structural and artistic elements, Jason details the incredible journey of bringing this home to life. The conversation dives into the long-term implications of 3D printing in construction, its potential for large-scale housing developments, and the lessons learned from Habitat's pioneering efforts. Jason also touches on Habitat's broader mission, its work in providing energy-efficient and affordable housing, and how listeners can engage with their local Habitat affiliates. This episode is a fascinating look at the intersection of technology, sustainability, and community impact. Here is the LinkedIn profile for our podcast guest Jason Barlow: https://www.linkedin.com/in/jasbarlow/ Additional information as well on the 3D home. https://habitatcaz.org/habitats-first-3d-printed-home-in-the-u-s/ Habitat Resnet Presentation 2025.pdf Habitat for Humanity's first 3D-printed home in Arizona Why not us? Habitat's 3D-Printed home in Tempe, Arizona (3 min) To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Thanks for listening to the buildCAST. In this episode we hear from Steve Baden, the founding Executive director of the Residential Energy Services Network or RESNET. RESNET is the governing body of the HERS home energy rating industry which was established in 1995 by the National Association of State Energy Officials, Energy Rated Homes of America, and the National Mortgage Industry Association to develop a national market for home energy rating systems and energy-efficient mortgages. I was in the first energy Rater training and became the 32nd certified rater in Colorado in 1995. Soon after I met Steve at the first RESNET conference held at the Florida Solar Energy Center when I was on the board of directors of Energy Rated Homes of Colorado. Even before Jimmy Carter's infamous, wear a sweater speech from the oval office Steve's path to energy efficiency came though politics. He found himself in Alaska at the state energy office leading an initiative called “Warming Homes for Alaskans” which received the 1993 national award for the most outstanding state housing program, and which set the stage for a national home energy rating program that RESNET grew out of.Steve has worked in the residential energy efficiency field for 30 years, including eighteen years with home energy ratings and energy mortgages on both the state and national levels, and ten years administering the Alaska State Energy Office. Steve is the founding executive director of RESNET and has just announced that he will be retiring on December 31st, 2025. I have known Steve from the beginning of the founding of RESNET and my time as a board member of Energy Rated Homes of Colorado and was so happy to have the opportunity to recap his career on the buildCAST before his last day. Thanks for all you have done for the Industry Steve.Steve Baden on LinkedInRESNET - Residential Energy Services Network
#198: Saint Mary's Student-Run Help Desk + SolarWinds' 3X ROI Higher Ed Solutions [EDUCAUSE 2024]Brought to you by:SentinelOne—Learn how SentinelOne empowers this state to stay secure.Verizon Frontline—The advanced network that keeps first responders connected when it matters most.Carahsoft—The Trusted Public Sector IT Solutions Provider™, supports government agencies and education/healthcare markets. Contact your Carahsoft rep today to access special discount pricing exclusively through the TechTables + Carahsoft partnership!Featuring:Kathy Hausmann, Director of Digital Infrastructure and Support Services at Saint Mary's CollegeTyler Horning, Senior Account Manager, Central Region, Public Sector, SolarWindsWhat You'll Learn:Saint Mary's Student-Run IT Support EvolutionDiscover how Saint Mary's College transformed from post-it note tracking to a sophisticated Web Help Desk system that resolves nearly 50% of tickets within one dayExplore their innovative two-tier support model: ResNet (student tech support) and Help Desk (faculty/staff support), both primarily staffed by studentsLearn how this IT experience creates a powerful talent pipeline, with former student workers landing positions at Google, JPMorgan Chase, and other major companiesComprehensive Student Training and Support OperationsInside Saint Mary's multi-phase training program: 3½ days of focused instruction followed by immersive orientation weekend experience that covers "everything they'll see in an academic year"How the FAQ feature creates invisible success stories by helping students solve problems independently without completing ticket submissionsThe critical philosophy of teaching student staff it's "okay to say 'I don't know'" and escalate issues to supervisorsSolarWinds Partnership and Customized SolutionsHow Web Help Desk's customization enables Saint Mary's to tailor experiences for different campus populations (including the "three strikes" system for students vs. automated reminders for faculty/staff)Impressive ROI metrics across higher education: 51% of institutions report 1-2x investment return with 17% seeing 3x return, and some seeing value within just 5 days of implementationThe power of automated workflows, escalations, and knowledge base development in creating efficient support operationsLooking Beyond TechnologyWhy Kathy recommends IT support jobs for students across ALL disciplines: "Nursing majors learn troubleshooting, education majors gain classroom-relevant skills, philosophy majors can thrive"Tyler's advice to students: "Absorb as much as you can, understand the broader organizational flow, and learn how to solve others' problems"How Saint Mary's exemplifies higher education's common challenges (understaffing, budget constraints) while creating innovative solutions through student empowermentTimestamps(00:00:00) Intro (00:00:58) Meet the innovators: SolarWinds and Saint Mary's College (00:03:58) From Post-it notes to digital transformation (00:05:05) Web Help Desk: The Spring 2014 revolution (00:07:42) Student-powered IT support model (00:09:22) Impressive metrics: 50% of tickets resolved within one day (00:13:19) Student IT bootcamp: The intensive training approach (00:19:43) Customizing support for campus culture (00:22:02) Career advice: "Any student should look at working in IT"Connect
"Leadership and learning are indispensable to each other." – John F. Kennedy In this episode of RESTalk, host Bill Spohn welcomes the 2025 Emerging Leadership Council (ELC) fellows—Jennifer Goldberg, Dorian Gothard, and Alex Haworth—for an insightful discussion on their career journeys, experiences at the RESNET conference, and their visions for the future of energy efficiency and building science. Each fellow shares how they entered the industry, what drew them to the ELC program, and how they apply their knowledge professionally and personally. The conversation covers key takeaways from the RESNET conference, including the industry's ambitious goal of 1 million HERS ratings by 2028, the importance of HVAC grading, and the role of building science in improving homes and communities. Throughout the episode, the fellows highlight the value of collaboration, education, and continuous learning, inspiring those new to the industry and seasoned professionals. Whether you're a rater, builder, or someone passionate about energy efficiency, this episode provides valuable insights into the next generation of industry leaders shaping the future. Here are the LinkedIn profiles for our podcast guests: Alex Haworth: https://www.linkedin.com/in/alexander-haworth-aa1b90227/ Jennifer Goldberg: https://www.linkedin.com/in/jennifer-goldberg-4688b6107/ Dorian Gothard: https://www.linkedin.com/in/dorian-gothard/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
“Excellence is never an accident. It results from high intention, sincere effort, intelligent direction, skillful execution, and the vision to see obstacles as opportunities." – Aristotle (adapted interpretation) In this insightful episode of RESTalk, host Bill Spohn interviews three key members of RESNET's team.: Ryan Moore, Jordi Kimbrough in Quality Assurance, and Michael Matthews in Programs. Each brings a unique background and perspective to their roles, strengthening RESNET's mission to ensure high-quality standards in energy-efficient home ratings. Ryan Moore, the Quality Assurance Investigations Program Manager, shares his journey from a varied background in environmental studies, construction, and tourism to his pivotal role at RESNET. Ryan emphasizes the importance of a level playing field in the rating industry, using investigations to ensure compliance and education among stakeholders. Jordi Kimbrough, Quality Assurance Project Manager, reflects on her serendipitous entry into the rating world and her focus on making providers' challenging roles more manageable. She highlights her goals of streamlining processes, providing training, and gathering feedback to reduce ambiguity and enhance the industry's efficiency. Michael Matthews, Programs Engagement Specialist, rounds out the conversation with his passion for connecting people and advancing energy efficiency. From his early days in weatherization to his current role, Michael facilitates stakeholder conversations, promotes better building practices, and supports t's RESNET's ambitious goals. The episode concludes with discussing the value of teamwork, service, and a shared commitment to sustainability, underscoring the human connection behind RESNET's impactful work. Here is the contact info for our guests: Jordi Kimbrough: Jordi@resnet.us Michael Matthews Michael@resnet.us, https://www.linkedin.com/in/michael-matthews-a8931928/ Ryan Moore: Ryanmoore@resnet.us, https://www.linkedin.com/in/rymoore/ Here is a link to the complaint resolution process: https://www.hersindex.com/about-resnet/complaint-resolution-process/ and the Complaint form https://www.hersindex.com/file-complaint-resolution/ Job Posting for Regional Positions for QA specialists: (Posted 12-4-24) https://www.resnet.us/articles/job-posting-resnet-qa-compliance-specialists-regional-positions/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. For more info on this topic, contact RESNET at INFO@RESNET.US
Networking is not about just connecting people. It's about connecting people with ideas and opportunities -Michele Jennae In this episode of RESTalk, Bill Spohn chats with Clara Hedrick, Lead Events Coordinator for RESNET®, to discuss the upcoming 2025 RESNET® Conference. Clara shares the event details, which is set to take place from January 26-29, 2025, in Tempe, Arizona. Attendees can look forward to engaging sessions, dynamic networking opportunities, and exciting receptions hosted by sponsors like NAIMA and Knauf. Clara highlights the innovative offerings this year, including a live demonstration of the new RESNET® insulation grading standard and Knauf's VR training system. Clara also delves into the conference's tracks, which cover a broad range of topics such as building science, workforce development, and energy codes. Special events like the offsite Fulton Homes tour and training opportunities add further value to the conference. Looking ahead, Clara announces future RESNET® conference locations: San Antonio, TX in 2026 and Savannah, Georgia, in 2027, both offering unique settings for continued learning and connection. Bill wraps up by encouraging listeners to explore the conference page at resnet.us/2025 for more details and download the WHOVA app to easily access schedules, session info, and networking opportunities. Here's the link to the conference website: https://whova.com/web/1qIMgnn8Jh5Y4xsupaZaVuFO57dR7GOfzNfNibqilSE%3D/ To reach the conference team, you can email them at: conference@resnet.us To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Send us a textIn this episode of the Digital Pathology Podcast, you will learn about cytology's entrance into the digital pathology space, including successful AI and scanner implementations. We cover AI's role in rapid on-site evaluation for lung cancer and share insights on a looming prostate cancer surge and how digital pathology and AI can help. IYou will also listen to a live demo of me using an AI assistant to decode a scientific paper in real-time. Tune in to stay on top of the digital pathology research in 2025!00:00 Welcome to DigiPath Digest00:53 Introduction and New Year Greetings01:41 Diving into DigiPath Digest01:44 AI in Respiratory Cytology06:11 The Role of AI in Pathology09:49 Multi-Omics and AI11:28 Radiomics and Pathomics14:44 Live Q&A and Future Plans20:09 Prostate Cancer Tsunami22:34 Thyroid Cytology and Live AI-Assistant demo31:07 Conclusion and the option to send texts :)Links and Resources:Subscribe to Digital Pathology Podcast on YouTubeFree E-book "Pathology 101"YouTube (unedited) version of this episodeTry Perplexity with my referral linkMy new page built with PerplexityPublications Discussed Today:
Happy holidays! We'll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.The single most requested domain was computer vision, and we could think of no one better to help us recap 2024 than our friends at Roboflow, who was one of our earliest guests in 2023 and had one of this year's top episodes in 2024 again. Roboflow has since raised a $40m Series B!LinksTheir slides are here:All the trends and papers they picked:* Isaac Robinson* Sora (see our Video Diffusion pod) - extending diffusion from images to video* SAM 2: Segment Anything in Images and Videos (see our SAM2 pod) - extending prompted masks to full video object segmentation* DETR Dominancy: DETRs show Pareto improvement over YOLOs* RT-DETR: DETRs Beat YOLOs on Real-time Object Detection* LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection* D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement* Peter Robicheaux* MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs)* * Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks) * PalíGemma / PaliGemma 2* PaliGemma: A versatile 3B VLM for transfer* PaliGemma 2: A Family of Versatile VLMs for Transfer* AlMv2 (Multimodal Autoregressive Pre-training of Large Vision Encoders) * Vik Korrapati - MoondreamFull Talk on YouTubeWant more content like this? Like and subscribe to stay updated on our latest talks, interviews, and podcasts.Transcript/Timestamps[00:00:00] Intro[00:00:05] AI Charlie: welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. When we were thinking of ways to add value to our academic conference coverage, we realized that there was a lack of good talks, just recapping the best of 2024, going domain by domain.[00:00:36] AI Charlie: We sent out a survey to the over 900 of you. who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field. 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our second featured keynote is The Best of Vision 2024, with Peter Robichaud and Isaac [00:01:00] Robinson of Roboflow, with a special appearance from Vic Corrapati of Moondream.[00:01:05] AI Charlie: When we did a poll of our attendees, the highest interest domain of the year was vision. And so our first port of call was our friends at Roboflow. Joseph Nelson helped us kickstart our vision coverage in episode 7 last year, and this year came back as a guest host with Nikki Ravey of Meta to cover segment Anything 2.[00:01:25] AI Charlie: Roboflow have consistently been the leaders in open source vision models and tooling. With their SuperVision library recently eclipsing PyTorch's Vision library. And Roboflow Universe hosting hundreds of thousands of open source vision datasets and models. They have since announced a 40 million Series B led by Google Ventures.[00:01:46] AI Charlie: Woohoo.[00:01:48] Isaac's picks[00:01:48] Isaac Robinson: Hi, we're Isaac and Peter from Roboflow, and we're going to talk about the best papers of 2024 in computer vision. So, for us, we defined best as what made [00:02:00] the biggest shifts in the space. And to determine that, we looked at what are some major trends that happened and what papers most contributed to those trends.[00:02:09] Isaac Robinson: So I'm going to talk about a couple trends, Peter's going to talk about a trend, And then we're going to hand it off to Moondream. So, the trends that I'm interested in talking about are These are a major transition from models that run on per image basis to models that run using the same basic ideas on video.[00:02:28] Isaac Robinson: And then also how debtors are starting to take over the real time object detection scene from the YOLOs, which have been dominant for years.[00:02:37] Sora, OpenSora and Video Vision vs Generation[00:02:37] Isaac Robinson: So as a highlight we're going to talk about Sora, which from my perspective is the biggest paper of 2024, even though it came out in February. Is the what?[00:02:48] Isaac Robinson: Yeah. Yeah. So just it's a, SORA is just a a post. So I'm going to fill it in with details from replication efforts, including open SORA and related work, such as a stable [00:03:00] diffusion video. And then we're also going to talk about SAM2, which applies the SAM strategy to video. And then how debtors, These are the improvements in 2024 to debtors that are making them a Pareto improvement to YOLO based models.[00:03:15] Isaac Robinson: So to start this off, we're going to talk about the state of the art of video generation at the end of 2023, MagVIT MagVIT is a discrete token, video tokenizer akin to VQ, GAN, but applied to video sequences. And it actually outperforms state of the art handcrafted video compression frameworks.[00:03:38] Isaac Robinson: In terms of the bit rate versus human preference for quality and videos generated by autoregressing on these discrete tokens generate some pretty nice stuff, but up to like five seconds length and, you know, not super detailed. And then suddenly a few months later we have this, which when I saw it, it was totally mind blowing to me.[00:03:59] Isaac Robinson: 1080p, [00:04:00] a whole minute long. We've got light reflecting in puddles. That's reflective. Reminds me of those RTX demonstrations for next generation video games, such as Cyberpunk, but with better graphics. You can see some issues in the background if you look closely, but they're kind of, as with a lot of these models, the issues tend to be things that people aren't going to pay attention to unless they're looking for.[00:04:24] Isaac Robinson: In the same way that like six fingers on a hand. You're not going to notice is a giveaway unless you're looking for it. So yeah, as we said, SORA does not have a paper. So we're going to be filling it in with context from the rest of the computer vision scene attempting to replicate these efforts. So the first step, you have an LLM caption, a huge amount of videos.[00:04:48] Isaac Robinson: This, this is a trick that they introduced in Dolly 3, where they train a image captioning model to just generate very high quality captions for a huge corpus and then train a diffusion model [00:05:00] on that. Their Sora and their application efforts also show a bunch of other steps that are necessary for good video generation.[00:05:09] Isaac Robinson: Including filtering by aesthetic score and filtering by making sure the videos have enough motion. So they're not just like kind of the generators not learning to just generate static frames. So. Then we encode our video into a series of space time latents. Once again, SORA, very sparse in details.[00:05:29] Isaac Robinson: So the replication related works, OpenSORA actually uses a MAG VIT V2 itself to do this, but swapping out the discretization step with a classic VAE autoencoder framework. They show that there's a lot of benefit from getting the temporal compression, which makes a lot of sense as the Each sequential frames and videos have mostly redundant information.[00:05:53] Isaac Robinson: So by compressing against, compressing in the temporal space, you allow the latent to hold [00:06:00] a lot more semantic information while avoiding that duplicate. So, we've got our spacetime latents. Possibly via, there's some 3D VAE, presumably a MAG VATV2 and then you throw it into a diffusion transformer.[00:06:19] Isaac Robinson: So I think it's personally interesting to note that OpenSORA is using a MAG VATV2, which originally used an autoregressive transformer decoder to model the latent space, but is now using a diffusion diffusion transformer. So it's still a transformer happening. Just the question is like, is it?[00:06:37] Isaac Robinson: Parameterizing the stochastic differential equation is, or parameterizing a conditional distribution via autoregression. It's also it's also worth noting that most diffusion models today, the, the very high performance ones are switching away from the classic, like DDPM denoising diffusion probability modeling framework to rectified flows.[00:06:57] Isaac Robinson: Rectified flows have a very interesting property that as [00:07:00] they converge, they actually get closer to being able to be sampled with a single step. Which means that in practice, you can actually generate high quality samples much faster. Major problem of DDPM and related models for the past four years is just that they require many, many steps to generate high quality samples.[00:07:22] Isaac Robinson: So, and naturally, the third step is throwing lots of compute at the problem. So I didn't, I never figured out how to manage to get this video to loop, but we see very little compute, medium compute, lots of compute. This is so interesting because the the original diffusion transformer paper from Facebook actually showed that, in fact, the specific hyperparameters of the transformer didn't really matter that much.[00:07:48] Isaac Robinson: What mattered was that you were just increasing the amount of compute that the model had. So, I love how in the, once again, little blog posts, they don't even talk about [00:08:00] like the specific hyperparameters. They say, we're using a diffusion transformer, and we're just throwing more compute at it, and this is what happens.[00:08:08] Isaac Robinson: OpenSora shows similar results. The primary issue I think here is that no one else has 32x compute budget. So we end up with these we end up in the middle of the domain and most of the related work, which is still super, super cool. It's just a little disappointing considering the context. So I think this is a beautiful extension of the framework that was introduced in 22 and 23 for these very high quality per image generation and then extending that to videos.[00:08:39] Isaac Robinson: It's awesome. And it's GA as of Monday, except no one can seem to get access to it because they keep shutting down the login.[00:08:46] SAM and SAM2[00:08:46] Isaac Robinson: The next, so next paper I wanted to talk about is SAM. So we at Roboflow allow users to label data and train models on that data. Sam, for us, has saved our users 75 years of [00:09:00] labeling time.[00:09:00] Isaac Robinson: We are the, to the best of my knowledge, the largest SAM API that exists. We also, SAM also allows us to have our users train just pure bounding box regression models and use those to generate high quality masks which has the great side effect of requiring less training data to have a meaningful convergence.[00:09:20] Isaac Robinson: So most people are data limited in the real world. So anything that requires less data to get to a useful thing is that super useful. Most of our users actually run their object per frame object detectors on every frame in a video, or maybe not most, but many, many. And so Sam follows into this category of taking, Sam 2 falls into this category of taking something that really really works and applying it to a video which has the wonderful benefit of being plug and play with most of our Many of our users use cases.[00:09:53] Isaac Robinson: We're, we're still building out a sufficiently mature pipeline to take advantage of that, but it's, it's in the works. [00:10:00] So here we've got a great example. We can click on cells and then follow them. You even notice the cell goes away and comes back and we can still keep track of it which is very challenging for existing object trackers.[00:10:14] Isaac Robinson: High level overview of how SAM2 works. We there's a simple pipeline here where we can give, provide some type of prompt and it fills out the rest of the likely masks for that object throughout the rest of the video. So here we're giving a bounding box in the first frame, a set of positive negative points, or even just a simple mask.[00:10:36] Isaac Robinson: I'm going to assume people are somewhat familiar with SAM. So I'm going to just give a high level overview of how SAM works. You have an image encoder that runs on every frame. SAM two can be used on a single image, in which case the only difference between SAM two and SAM is that image encoder, which Sam used a standard VIT [00:11:00] Sam two replaced that with a hara hierarchical encoder, which gets approximately the same results, but leads to a six times faster inference, which is.[00:11:11] Isaac Robinson: Excellent, especially considering how in a trend of 23 was replacing the VAT with more efficient backbones. In the case where you're doing video segmentation, the difference is that you actually create a memory bank and you cross attend the features from the image encoder based on the memory bank.[00:11:31] Isaac Robinson: So the feature set that is created is essentially well, I'll go more into it in a couple of slides, but we take the features from the past couple frames, plus a set of object pointers and the set of prompts and use that to generate our new masks. Then we then fuse the new masks for this frame with the.[00:11:57] Isaac Robinson: Image features and add that to the memory bank. [00:12:00] It's, well, I'll say more in a minute. The just like SAM, the SAM2 actually uses a data engine to create its data set in that people are, they assembled a huge amount of reference data, used people to label some of it and train the model used the model to label more of it and asked people to refine the predictions of the model.[00:12:20] Isaac Robinson: And then ultimately the data set is just created from the engine Final output of the model on the reference data. It's very interesting. This paradigm is so interesting to me because it unifies a model in a dataset in a way that is very unique. It seems unlikely that another model could come in and have such a tight.[00:12:37] Isaac Robinson: So brief overview of how the memory bank works, the paper did not have a great visual, so I'm just, I'm going to fill in a bit more. So we take the last couple of frames from our video. And we take the last couple of frames from our video attend that, along with the set of prompts that we provided, they could come from the future, [00:13:00] they could come from anywhere in the video, as well as reference object pointers, saying, by the way, here's what we've found so far attending to the last few frames has the interesting benefit of allowing it to model complex object motion without actually[00:13:18] Isaac Robinson: By limiting the amount of frames that you attend to, you manage to keep the model running in real time. This is such an interesting topic for me because one would assume that attending to all of the frames is super essential, or having some type of summarization of all the frames is super essential for high performance.[00:13:35] Isaac Robinson: But we see in their later ablation that that actually is not the case. So here, just to make sure that there is some benchmarking happening, we just compared to some of the stuff that's came out prior, and indeed the SAM2 strategy does improve on the state of the art. This ablation deep in their dependencies was super interesting to me.[00:13:59] Isaac Robinson: [00:14:00] We see in section C, the number of memories. One would assume that increasing the count of memories would meaningfully increase performance. And we see that it has some impact, but not the type that you'd expect. And that it meaningfully decreases speed, which justifies, in my mind, just having this FIFO queue of memories.[00:14:20] Isaac Robinson: Although in the future, I'm super interested to see A more dedicated summarization of all of the last video, not just a stacking of the last frames. So that another extension of beautiful per frame work into the video domain.[00:14:42] Realtime detection: DETRs > YOLO[00:14:42] Isaac Robinson: The next trend I'm interested in talking about is this interesting at RoboFlow, we're super interested in training real time object detectors.[00:14:50] Isaac Robinson: Those are bread and butter. And so we're doing a lot to keep track of what is actually happening in that space. We are finally starting to see something change. So, [00:15:00] for years, YOLOs have been the dominant way of doing real time object detection, and we can see here that they've essentially stagnated.[00:15:08] Isaac Robinson: The performance between 10 and 11 is not meaningfully different, at least, you know, in this type of high level chart. And even from the last couple series, there's not. A major change so YOLOs have hit a plateau, debtors have not. So we can look here and see the YOLO series has this plateau. And then these RT debtor, LW debtor, and Define have meaningfully changed that plateau so that in fact, the best Define models are plus 4.[00:15:43] Isaac Robinson: 6 AP on Cocoa at the same latency. So three major steps to accomplish this. The first RT deditor, which is technically a 2023 paper preprint, but published officially in 24, so I'm going to include that. I hope that's okay. [00:16:00] That is showed that RT deditor showed that we could actually match or out speed YOLOs.[00:16:04] Isaac Robinson: And then LWdebtor showed that pre training is hugely effective on debtors and much less so on YOLOs. And then DeFine added the types of bells and whistles that we expect from these types, this, this arena. So the major improvements that RTdebtor shows was Taking the multi scale features that debtors typically pass into their encoder and decoupling them into a much more efficient transformer encoder.[00:16:30] Isaac Robinson: The transformer is of course, quadratic complexity. So decreasing the amount of stuff that you pass in at once is super helpful for increasing your runtime or increasing your throughput. So that change basically brought us up to yellow speed and then they do a hardcore analysis on. Benchmarking YOLOs, including the NMS step.[00:16:54] Isaac Robinson: Once you once you include the NMS in the latency calculation, you see that in fact, these debtors [00:17:00] are outperforming, at least this time, the the, the YOLOs that existed. Then LW debtor goes in and suggests that in fact, the frame, the huge boost here is from pre training. So, this is the define line, and this is the define line without pre training.[00:17:19] Isaac Robinson: It's within range, it's still an improvement over the YOLOs, but Really huge boost comes from the benefit of pre training. When YOLOx came out in 2021, they showed that they got much better results by having a much, much longer training time, but they found that when they did that, they actually did not benefit from pre training.[00:17:40] Isaac Robinson: So, you see in this graph from LWdebtor, in fact, YOLOs do have a real benefit from pre training, but it goes away as we increase the training time. Then, the debtors converge much faster. LWdebtor trains for only 50 epochs, RTdebtor is 60 epochs. So, one could assume that, in fact, [00:18:00] the entire extra gain from pre training is that you're not destroying your original weights.[00:18:06] Isaac Robinson: By relying on this long training cycle. And then LWdebtor also shows superior performance to our favorite data set, Roboflow 100 which means that they do better on the real world, not just on Cocoa. Then Define throws all the bells and whistles at it. Yellow models tend to have a lot of very specific complicated loss functions.[00:18:26] Isaac Robinson: This Define brings that into the debtor world and shows consistent improvement on a variety of debtor based frameworks. So bring these all together and we see that suddenly we have almost 60 AP on Cocoa while running in like 10 milliseconds. Huge, huge stuff. So we're spending a lot of time trying to build models that work better with less data and debtors are clearly becoming a promising step in that direction.[00:18:56] Isaac Robinson: The, what we're interested in seeing [00:19:00] from the debtors in this, this trend to next is. Codetter and the models that are currently sitting on the top of the leaderboard for large scale inference scale really well as you switch out the backbone. We're very interested in seeing and having people publish a paper, potentially us, on what happens if you take these real time ones and then throw a Swingy at it.[00:19:23] Isaac Robinson: Like, do we have a Pareto curve that extends from the real time domain all the way up to the super, super slow but high performance domain? We also want to see people benchmarking in RF100 more, because that type of data is what's relevant for most users. And we want to see more pre training, because pre training works now.[00:19:43] Isaac Robinson: It's super cool.[00:19:48] Peter's Picks[00:19:48] Peter Robicheaux: Alright, so, yeah, so in that theme one of the big things that we're focusing on is how do we get more out of our pre trained models. And one of the lenses to look at this is through sort of [00:20:00] this, this new requirement for like, how Fine grained visual details and your representations that are extracted from your foundation model.[00:20:08] Peter Robicheaux: So it's sort of a hook for this Oh, yeah, this is just a list of all the the papers that I'm going to mention I just want to make sure I set an actual paper so you can find it later[00:20:18] MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs)[00:20:18] Peter Robicheaux: Yeah, so sort of the big hook here is that I make the claim that LLMs can't see if you go to if you go to Claude or ChatGPT you ask it to see this Watch and tell me what time it is, it fails, right?[00:20:34] Peter Robicheaux: And so you could say, like, maybe, maybe the Like, this is, like, a very classic test of an LLM, but you could say, Okay, maybe this, this image is, like, too zoomed out, And it just, like, it'll do better if we increase the resolution, And it has easier time finding these fine grained features, Like, where the watch hands are pointing.[00:20:53] Peter Robicheaux: Nodice. And you can say, okay, well, maybe the model just doesn't know how to tell time from knowing the position of the hands. But if you actually prompt [00:21:00] it textually, it's very easy for it to tell the time. So this to me is proof that these LLMs literally cannot see the position of the watch hands and it can't see those details.[00:21:08] Peter Robicheaux: So the question is sort of why? And for you anthropic heads out there, cloud fails too. So the, the, my first pick for best paper of 2024 Envision is this MMVP paper, which tries to investigate the Why do LLMs not have the ability to see fine grained details? And so, for instance, it comes up with a lot of images like this, where you ask it a question that seems very visually apparent to us, like, which way is the school bus facing?[00:21:32] Peter Robicheaux: And it gets it wrong, and then, of course, it makes up details to support its wrong claim. And so, the process by which it finds these images is sort of contained in its hypothesis for why it can't. See these details. So it hypothesizes that models that have been initialized with, with Clip as their vision encoder, they don't have fine grained details and the, the features extracted using Clip because Clip sort of doesn't need to find these fine grained [00:22:00] details to do its job correctly, which is just to match captions and images, right?[00:22:04] Peter Robicheaux: And sort of at a high level, even if ChatGPT wasn't initialized with Clip and wasn't trained contrastively at all. The vision encoder wasn't trained contrastively at all. Still, in order to do its job of capturing the image it could do a pretty good job without actually finding the exact position of all the objects and visual features in the image, right?[00:22:21] Peter Robicheaux: So This paper finds a set of difficult images for these types of models. And the way it does it is it looks for embeddings that are similar in clip space, but far in DynaV2 space. So DynaV2 is a foundation model that was trained self supervised purely on image data. And it kind of uses like some complex student teacher framework, but essentially, and like, it patches out like certain areas of the image or like crops with certain areas of the image and tries to make sure that those have consistent representations, which is a way for it to learn very fine grained visual features.[00:22:54] Peter Robicheaux: And so if you take things that are very close in clip space and very far in DynaV2 space, you get a set of images [00:23:00] that Basically, pairs of images that are hard for a chat GPT and other big language models to distinguish. So, if you then ask it questions about this image, well, as you can see from this chart, it's going to answer the same way for both images, right?[00:23:14] Peter Robicheaux: Because to, to, from the perspective of the vision encoder, they're the same image. And so if you ask a question like, how many eyes does this animal have? It answers the same for both. And like all these other models, including Lava do the same thing, right? And so this is the benchmark that they create, which is like finding clip, like clip line pairs, which is pairs of images that are similar in clip space and creating a data set of multiple choice questions based off of those.[00:23:39] Peter Robicheaux: And so how do these models do? Well, really bad. Lava, I think, So, so, chat2BT and Jim and I do a little bit better than random guessing, but, like, half of the performance of humans who find these problems to be very easy. Lava is, interestingly, extremely negatively correlated with this dataset. It does much, much, much, much worse [00:24:00] than random guessing, which means that this process has done a very good job of identifying hard images for, for Lava, specifically.[00:24:07] Peter Robicheaux: And that's because Lava is basically not trained for very long and is initialized from Clip, and so You would expect it to do poorly on this dataset. So, one of the proposed solutions that this paper attempts is by basically saying, Okay, well if clip features aren't enough, What if we train the visual encoder of the language model also on dyno features?[00:24:27] Peter Robicheaux: And so it, it proposes two different ways of doing this. One, additively which is basically interpolating between the two features, and then one is interleaving, which is just kind of like training one on the combination of both features. So there's this really interesting trend when you do the additive mixture of features.[00:24:45] Peter Robicheaux: So zero is all clip features and one is all DynaV2 features. So. It, as you, so I think it's helpful to look at the right most chart first, which is as you increase the number of DynaV2 features, your model does worse and worse and [00:25:00] worse on the actual language modeling task. And that's because DynaV2 features were trained completely from a self supervised manner and completely in image space.[00:25:08] Peter Robicheaux: It knows nothing about text. These features aren't really compatible with these text models. And so you can train an adapter all you want, but it seems that it's in such an alien language that it's like a very hard optimization for this. These models to solve. And so that kind of supports what's happening on the left, which is that, yeah, it gets better at answering these questions if as you include more dyna V two features up to a point, but then you, when you oversaturate, it completely loses its ability to like.[00:25:36] Peter Robicheaux: Answer language and do language tasks. So you can also see with the interleaving, like they essentially double the number of tokens that are going into these models and just train on both, and it still doesn't really solve the MMVP task. It gets Lava 1. 5 above random guessing by a little bit, but it's still not close to ChachiPT or, you know, Any like human performance, obviously.[00:25:59] Peter Robicheaux: [00:26:00] So clearly this proposed solution of just using DynaV2 features directly, isn't going to work. And basically what that means is that as a as a vision foundation model, DynaV2 is going to be insufficient for language tasks, right?[00:26:14] Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks)[00:26:14] Peter Robicheaux: So my next pick for best paper of 2024 would be Florence 2, which tries to solve this problem by incorporating not only This dimension of spatial hierarchy, which is to say pixel level understanding, but also in making sure to include what they call semantic granularity, which ends up, the goal is basically to have features that are sufficient for finding objects in the image, so they're, they're, they have enough pixel information, but also can be talked about and can be reasoned about.[00:26:44] Peter Robicheaux: And that's on the semantic granularity axis. So here's an example of basically three different paradigms of labeling that they do. So they, they create a big dataset. One is text, which is just captioning. And you would expect a model that's trained [00:27:00] only on captioning to have similar performance like chat2BT and like not have spatial hierarchy, not have features that are meaningful at the pixel level.[00:27:08] Peter Robicheaux: And so they add another type, which is region text pairs, which is essentially either classifying a region or You're doing object detection or doing instance segmentation on that region or captioning that region. And then they have text phrased region annotations, which is essentially a triple. And basically, not only do you have a region that you've described, you also find it's like, It's placed in a descriptive paragraph about the image, which is basically trying to introduce even more like semantic understanding of these regions.[00:27:39] Peter Robicheaux: And so like, for instance, if you're saying a woman riding on the road, right, you have to know what a woman is and what the road is and that she's on top of it. And that's, that's basically composing a bunch of objects in this visual space, but also thinking about it semantically, right? And so the way that they do this is they take basically they just dump Features from a vision encoder [00:28:00] straight into a encoder decoder transformer.[00:28:03] Peter Robicheaux: And then they train a bunch of different tasks like object detection and so on as a language task. And I think that's one of the big things that we saw in 2024 is these, these vision language models operating in, on pixel space linguistically. So they introduced a bunch of new tokens to point to locations and[00:28:22] Peter Robicheaux: So how does it work? How does it actually do? We can see if you look at the graph on the right, which is using the, the Dino, the the Dino framework your, your pre trained Florence 2 models transfer very, very well. They get 60%, 60 percent map on Cocoa, which is like approaching state of the art and they train[00:28:42] Vik Korrapati: with, and they[00:28:43] Peter Robicheaux: train with a much more more efficiently.[00:28:47] Peter Robicheaux: So they, they converge a lot faster, which both of these things are pointing to the fact that they're actually leveraging their pre trained weights effectively. So where is it falling short? So these models, I forgot to mention, Florence is a 0. 2 [00:29:00] billion and a 0. 7 billion parameter count. So they're very, very small in terms of being a language model.[00:29:05] Peter Robicheaux: And I think that. This framework, you can see saturation. So, what this graph is showing is that if you train a Florence 2 model purely on the image level and region level annotations and not including the pixel level annotations, like this, segmentation, it actually performs better as an object detector.[00:29:25] Peter Robicheaux: And what that means is that it's not able to actually learn all the visual tasks that it's trying to learn because it doesn't have enough capacity.[00:29:32] PalíGemma / PaliGemma 2[00:29:32] Peter Robicheaux: So I'd like to see this paper explore larger model sizes, which brings us to our next big paper of 2024 or two papers. So PolyGemma came out earlier this year.[00:29:42] Peter Robicheaux: PolyGemma 2 was released, I think like a week or two ago. Oh, I forgot to mention, you can actually train You can, like, label text datasets on RoboFlow and you can train a Florence 2 model and you can actually train a PolyGemma 2 model on RoboFlow, which we got into the platform within, like, 14 hours of release, which I was really excited about.[00:29:59] Peter Robicheaux: So, anyway, so [00:30:00] PolyGemma 2, so PolyGemma is essentially doing the same thing, but instead of doing an encoder decoder, it just dumps everything into a decoder only transformer model. But it also introduced the concept of location tokens to point to objects in pixel space. PolyGemma 2, so PolyGemma uses Gemma as the language encoder, and it uses Gemma2B.[00:30:17] Peter Robicheaux: PolyGemma 2 introduces using multiple different sizes of language encoders. So, the way that they sort of get around having to do encoder decoder is they use the concept of prefix loss. Which basically means that when it's generating, tokens autoregressively, it's all those tokens in the prefix, which is like the image that it's looking at and like a description of the task that it's trying to do.[00:30:41] Peter Robicheaux: They're attending to each other fully, full attention. Which means that, you know, it can sort of. Find high level it's easier for the, the prefix to color, to color the output of the suffix and also to just find like features easily. So this is sort of [00:31:00] an example of like one of the tasks that was trained on, which is like, you describe the task in English and then you give it all these, like, You're asking for it to segment these two classes of objects, and then it finds, like, their locations using these tokens, and it finds their masks using some encoding of the masks into tokens.[00:31:24] Peter Robicheaux: And, yeah, so, one of my critiques, I guess, of PolyGemma 1, at least, is that You find that performance saturates as a pre trained model after only 300 million examples seen. So, what this graph is representing is each blue dot is a performance on some downstream task. And you can see that after seeing 300 million examples, It sort of does equally well on all of the downtrend tasks that they tried it on, which was a lot as 1 billion examples, which to me also kind of suggests a lack of capacity for this model.[00:31:58] Peter Robicheaux: PolyGemma2, [00:32:00] you can see the results on object detection. So these were transferred to to Coco. And you can see that this sort of also points to an increase in capacity being helpful to the model. You can see as. Both the resolution increases, and the parameter count of the language model increases, performance increases.[00:32:16] Peter Robicheaux: So resolution makes sense, obviously, it helps to find small images, or small objects in the image. But it also makes sense for another reason, which is that it kind of gives the model a thinking register, and it gives it more tokens to, like, process when making its predictions. But yeah, you could, you could say, oh, 43.[00:32:30] Peter Robicheaux: 6, that's not that great, like Florence 2 got 60. But this is not Training a dino or a debtor on top of this language or this image encoder. It's doing the raw language modeling task on Cocoa. So it doesn't have any of the bells and whistles. It doesn't have any of the fancy losses. It doesn't even have bipartite graph matching or anything like that.[00:32:52] Peter Robicheaux: Okay, the big result and one of the reasons that I was really excited about this paper is that they blow everything else away [00:33:00] on MMVP. I mean, 47. 3, sure, that's nowhere near human accuracy, which, again, is 94%, but for a, you know, a 2 billion language, 2 billion parameter language model to be chat2BT, that's quite the achievement.[00:33:12] Peter Robicheaux: And that sort of brings us to our final pick for paper of the year, which is AIMV2. So, AIMV2 sort of says, okay, Maybe this language model, like, maybe coming up with all these specific annotations to find features and with high fidelity and pixel space isn't actually necessary. And we can come up with an even simpler, more beautiful idea for combining you know, image tokens and pixel tokens in a way that's interfaceable for language tasks.[00:33:44] Peter Robicheaux: And this is nice because it can scale, you can come up with lots more data if you don't have to come up with all these annotations, right? So the way that it works. is it does something very, very similar to PolyGemo, where you have a vision encoder that dumps image tokens into a decoder only transformer.[00:33:59] Peter Robicheaux: But [00:34:00] the interesting thing is that it also autoregressively tries to learn the mean squared error of the image tokens. So instead of having to come up with fancy object detection or semantic, or segment, or segmentation labels, you can just try to reconstruct the image and have it learn fine grained features that way.[00:34:16] Peter Robicheaux: And it does this in kind of, I think, a beautiful way that's kind of compatible with the PolyGemma line of thinking, which is randomly sampling a prefix line of thinking Prefix length and using only this number of image tokens as the prefix. And so doing a similar thing with the causal. So the causal with prefix is the, the attention mask on the right.[00:34:35] Peter Robicheaux: So it's doing full block attention with some randomly sampled number of image tokens to then reconstruct the rest of the image and the downstream caption for that image. And so, This is the dataset that they train on. It's image or internet scale data, very high quality data created by the data filtering networks paper, essentially which is maybe The best clip data that exists.[00:34:59] Peter Robicheaux: [00:35:00] And we can see that this is finally a model that doesn't saturate. It's even at the highest parameter count, it's, it appears to be, oh, at the highest parameter account, it appears to be improving in performance with more and more samples seen. And so you can sort of think that. You know, if we just keep bumping the parameter count and increasing the example scene, which is the, the, the line of thinking for language models, then it'll keep getting better.[00:35:27] Peter Robicheaux: So how does it actually do at finding, oh, it also improves with resolution, which you would expect for a model that This is the ImageNet classification accuracy, but yeah, it does better if you increase the resolution, which means that it's actually leveraging and finding fine grained visual features.[00:35:44] Peter Robicheaux: And so how does that actually do compared to CLIP on Cocoa? Well, you can see that if you slap a transformer detection head on it, Entry now in Cocoa, it's just 60. 2, which is also within spitting distance of Soda, which means that it does a very good job of [00:36:00] finding visual features, but you could say, okay, well, wait a second.[00:36:03] Peter Robicheaux: Clip got to 59. 1, so. Like, how does this prove your claim at all? Because doesn't that mean like clip, which is known to be clip blind and do badly on MMVP, it's able to achieve a very high performance on fine, on this fine grained visual features task of object detection, well, they train on like, Tons of data.[00:36:24] Peter Robicheaux: They train on like objects, 365, Cocoa, Flickr and everything else. And so I think that this benchmark doesn't do a great job of selling how good of a pre trained model MV2 is. And we would like to see the performance on fewer data as examples and not trained to convergence on object detection. So seeing it in the real world on like a dataset, like RoboFlow 100, I think would be quite interesting.[00:36:48] Peter Robicheaux: And our, our, I guess our final, final pick for paper of 2024 would be Moondream. So introducing Vic to talk about that.[00:36:54] swyx: But overall, that was exactly what I was looking for. Like best of 2024, an amazing job. Yeah, you can, [00:37:00] if there's any other questions while Vic gets set up, like vision stuff,[00:37:07] swyx: yeah,[00:37:11] swyx: Vic, go ahead. Hi,[00:37:13] Vik Korrapati / Moondream[00:37:13] question: well, while we're getting set up, hi, over here, thanks for the really awesome talk. One of the things that's been weird and surprising is that the foundation model companies Even these MLMs, they're just like worse than RT Tether at detection still. Like, if you wanted to pay a bunch of money to auto label your detection dataset, If you gave it to OpenAI or Cloud, that would be like a big waste.[00:37:37] question: So I'm curious, just like, even Pali Gemma 2, like is worse. So, so I'm curious to hear your thoughts on like, how come, Nobody's cracked the code on like a generalist that really you know, beats a specialist model in computer vision like they have in in LLM land.[00:38:00][00:38:01] Isaac Robinson: Okay. It's a very, very interesting question. I think it depends on the specific domain. For image classification, it's basically there. In the, in AIMv2 showed, a simple attentional probe on the pre trained features gets like 90%, which is as well as anyone does. The, the, the, the bigger question, like, why isn't it transferring to object detection, especially like real time object detection.[00:38:25] Isaac Robinson: I think, in my mind, there are two answers. One is, object detection is really, really, really the architectures are super domain specific. You know, we see these, all these super, super complicated things, and it's not super easy to, to, to build something that just transfers naturally like that, whereas image classification, you know, clip pre training transfers super, super quickly.[00:38:48] Isaac Robinson: And the other thing is, until recently, the real time object detectors didn't even really benefit from pre training. Like, you see the YOLOs that are like, essentially saturated, showing very little [00:39:00] difference with pre training improvements, with using pre trained model at all. It's not surprising, necessarily, that People aren't looking at the effects of better and better pre training on real time detection.[00:39:12] Isaac Robinson: Maybe that'll change in the next year. Does that answer your question?[00:39:17] Peter Robicheaux: Can you guys hear me? Yeah, one thing I want to add is just like, or just to summarize, basically, is that like, Until 2024, you know, we haven't really seen a combination of transformer based object detectors and fancy losses, and PolyGemma suffers from the same problem, which is basically to say that these ResNet, or like the convolutional models, they have all these, like, extreme optimizations for doing object detection, but essentially, I think it's kind of been shown now that convolution models like just don't benefit from pre training and just don't like have the level of intelligence of transformer models.[00:39:56] swyx: Awesome. Hi,[00:39:59] Vik Korrapati: can [00:40:00] you hear me?[00:40:01] swyx: Cool. I hear you. See you. Are you sharing your screen?[00:40:04] Vik Korrapati: Hi. Might have forgotten to do that. Let me do[00:40:07] swyx: that. Sorry, should have done[00:40:08] Vik Korrapati: that.[00:40:17] swyx: Here's your screen. Oh, classic. You might have to quit zoom and restart. What? It's fine. We have a capture of your screen.[00:40:34] swyx: So let's get to it.[00:40:35] Vik Korrapati: Okay, easy enough.[00:40:49] Vik Korrapati: All right. Hi, everyone. My name is Vic. I've been working on Moondream for almost a year now. Like Shawn mentioned, I just went and looked and it turns out the first version I released December [00:41:00] 29, 2023. It's been a fascinating journey. So Moonbeam started off as a tiny vision language model. Since then, we've expanded scope a little bit to also try and build some tooling, client libraries, et cetera, to help people really deploy it.[00:41:13] Vik Korrapati: Unlike traditional large models that are focused at assistant type use cases, we're laser focused on building capabilities that developers can, sorry, it's yeah, we're basically focused on building capabilities that developers can use to build vision applications that can run anywhere. So, in a lot of cases for vision more so than for text, you really care about being able to run on the edge, run in real time, etc.[00:41:40] Vik Korrapati: So That's really important. We have we have different output modalities that we support. There's query where you can ask general English questions about an image and get back human like answers. There's captioning, which a lot of our users use for generating synthetic datasets to then train diffusion models and whatnot.[00:41:57] Vik Korrapati: We've done a lot of work to minimize those sessions there. [00:42:00] So that's. Use lot. We have open vocabulary object detection built in similar to a couple of more recent models like Palagem, et cetera, where rather than having to train a dedicated model, you can just say show me soccer balls in this image or show me if there are any deer in this image, it'll detect it.[00:42:14] Vik Korrapati: More recently, earlier this month, we released pointing capability where if all you're interested in is the center of an object you can just ask it to point out where that is. This is very useful when you're doing, you know, I automation type stuff. Let's see, LA we, we have two models out right now.[00:42:33] Vik Korrapati: There's a general purpose to be para model, which runs fair. Like it's, it's it's fine if you're running on server. It's good for our local Amma desktop friends and it can run on flagship, flagship mobile phones, but it never. so much for joining us today, and we'll see you in the [00:43:00] next one. Less memory even with our not yet fully optimized inference client.[00:43:06] Vik Korrapati: So the way we built our 0. 5b model was to start with the 2 billion parameter model and prune it while doing continual training to retain performance. We, our objective during the pruning was to preserve accuracy across a broad set of benchmarks. So the way we went about it was to estimate the importance of different components of the model, like attention heads, channels MLP rows and whatnot using basically a technique based on the gradient.[00:43:37] Vik Korrapati: I'm not sure how much people want to know details. We'll be writing a paper about this, but feel free to grab me if you have more questions. Then we iteratively prune a small chunk that will minimize loss and performance retrain the model to recover performance and bring it back. The 0. 5b we released is more of a proof of concept that this is possible.[00:43:54] Vik Korrapati: I think the thing that's really exciting about this is it makes it possible for for developers to build using the 2B param [00:44:00] model and just explore, build their application, and then once they're ready to deploy figure out what exactly they need out of the model and prune those capabilities into a smaller form factor that makes sense for their deployment target.[00:44:12] Vik Korrapati: So yeah, very excited about that. Let me talk to you folks a little bit about another problem I've been working on recently, which is similar to the clocks example we've been talking about. We had a customer reach out who was talking about, like, who had a bunch of gauges out in the field. This is very common in manufacturing and oil and gas, where you have a bunch of analog devices that you need to monitor.[00:44:34] Vik Korrapati: It's expensive to. And I was like, okay, let's have humans look at that and monitor stuff and make sure that the system gets shut down when the temperature goes over 80 or something. So I was like, yeah, this seems easy enough. Happy to, happy to help you distill that. Let's, let's get it going. Turns out our model couldn't do it at all.[00:44:51] Vik Korrapati: I went and looked at other open source models to see if I could just generate a bunch of data and learn from that. Did not work either. So I was like, let's look at what the folks with [00:45:00] hundreds of billions of dollars in market cap have to offer. And yeah, that doesn't work either. My hypothesis is that like the, the way these models are trained are using a large amount of image text data scraped from the internet.[00:45:15] Vik Korrapati: And that can be biased. In the case of gauges, most gauge images aren't gauges in the wild, they're product images. Detail images like these, where it's always set to zero. It's paired with an alt text that says something like GIVTO, pressure sensor, PSI, zero to 30 or something. And so the models are fairly good at picking up those details.[00:45:35] Vik Korrapati: It'll tell you that it's a pressure gauge. It'll tell you what the brand is, but it doesn't really learn to pay attention to the needle over there. And so, yeah, that's a gap we need to address. So naturally my mind goes to like, let's use synthetic data to, Solve this problem. That works, but it's problematic because it turned out we needed millions of synthetic gauge images to get to reasonable performance.[00:45:57] Vik Korrapati: And thinking about it, reading a gauge is like [00:46:00] not a one, like it's not a zero short process in our minds, right? Like if you had to tell me the reading in Celsius for this, Real world gauge. There's two dials on there. So first you have to figure out which one you have to be paying attention to, like the inner one or the outer one.[00:46:14] Vik Korrapati: You look at the tip of the needle, you look at what labels it's between, and you count how many and do some math to figure out what that probably is. So what happens if we just add that as a Chain of thought to give the model better understanding of the different sub, to allow the model to better learn the subtasks it needs to perform to accomplish this goal.[00:46:37] Vik Korrapati: So you can see in this example, this was actually generated by the latest version of our model. It's like, okay, Celsius is the inner scale. It's between 50 and 60. There's 10 ticks. So the second tick, it's a little debatable here, like there's a weird shadow situation going on, the dial is off, so I don't know what the ground truth is, but it works okay.[00:46:57] Vik Korrapati: There's points on there that are, the points [00:47:00] over there are actually grounded. I don't know if this is easy to see, but when I click on those, there's a little red dot that moves around on the image. The model actually has to predict where this points are, I was already trying to do this with bounding boxes, but then Malmo came out with pointing capabilities.[00:47:15] Vik Korrapati: And it's like pointing is a much better paradigm to to represent this. We see pretty good results. This one's actually for clock reading. I couldn't find our chart for gauge reading at the last minute. So the light. Blue chart is with our rounded chain of thought. This measures, we have, we built a clock reading benchmark about 500 images.[00:47:37] Vik Korrapati: This measures accuracy on that. You can see it's a lot more sample efficient when you're using the chain of thought to model. Another big benefit from this approach is like, you can kind of understand how the model is. it and how it's failing. So in this example, the actual correct reading is 54 Celsius, the model output [00:48:00] 56, not too bad but you can actually go and see where it messed up. Like it got a lot of these right, except instead of saying it was on the 7th tick, it actually predicted that it was the 8th tick and that's why it went with 56.[00:48:14] Vik Korrapati: So now that you know that this. Failing in this way, you can adjust how you're doing the chain of thought to maybe say like, actually count out each tick from 40, instead of just trying to say it's the eighth tick. Or you might say like, okay, I see that there's that middle thing, I'll count from there instead of all the way from 40.[00:48:31] Vik Korrapati: So helps a ton. The other thing I'm excited about is a few short prompting or test time training with this. Like if a customer has a specific gauge that like we're seeing minor errors on, they can give us a couple of examples where like, if it's miss detecting the. Needle, they can go in and correct that in the chain of thought.[00:48:49] Vik Korrapati: And hopefully that works the next time. Now, exciting approach, we only apply it to clocks and gauges. The real question is, is it going to generalize? Probably, like, there's some science [00:49:00] from text models that when you train on a broad number of tasks, it does generalize. And I'm seeing some science with our model as well.[00:49:05] Vik Korrapati: So, in addition to the image based chain of thought stuff, I also added some spelling based chain of thought to help it understand better understand OCR, I guess. I don't understand why everyone doesn't do this, by the way. Like, it's trivial benchmark question. It's Very, very easy to nail. But I also wanted to support it for stuff like license plate, partial matching, like, hey, does any license plate in this image start with WHA or whatever?[00:49:29] Vik Korrapati: So yeah, that sort of worked. All right, that, that ends my story about the gauges. If you think about what's going on over here it's interesting that like LLMs are showing enormous. Progress in reasoning, especially with the latest set of models that we've seen, but we're not really seeing, I have a feeling that VLMs are lagging behind, as we can see with these tasks that should be very simple for a human to do [00:50:00] that are very easy to find VLMs failing at.[00:50:04] Vik Korrapati: My hypothesis on why this is the case is because On the internet, there's a ton of data that talks about how to reason. There's books about how to solve problems. There's books critiquing the books about how to solve problems. But humans are just so good at perception that we never really talk about it.[00:50:20] Vik Korrapati: Like, maybe in art books where it's like, hey, to show that that mountain is further away, you need to desaturate it a bit or whatever. But the actual data on how to, like, look at images is, isn't really present. Also, the Data we have is kind of sketched. The best source of data we have is like image all text pairs on the internet and that's pretty low quality.[00:50:40] Vik Korrapati: So yeah, I, I think our solution here is really just we need to teach them how to operate on individual tasks and figure out how to scale that out. All right. Yep. So conclusion. At Moondream we're trying to build amazing PLMs that run everywhere. Very hard problem. Much work ahead, but we're making a ton of progress and I'm really excited [00:51:00] about If anyone wants to chat about more technical details about how we're doing this or interest in collaborating, please, please hit me up.[00:51:08] Isaac Robinson: Yeah,[00:51:09] swyx: like, I always, when people say, when people say multi modality, like, you know, I always think about vision as the first among equals in all the modalities. So, I really appreciate having the experts in the room. Get full access to Latent Space at www.latent.space/subscribe
"The future belongs to those who believe in the beauty of their dreams." – Eleanor Roosevelt Welcome to a special episode of RESTalk: "Best of RESTalk 2024"! This year has been transformative for RESNET® and the residential energy efficiency industry. From setting bold goals to launching innovative tools and forming impactful collaborations, 2024 has redefined what's possible in energy efficiency and sustainability. In this episode, we revisit the year's most insightful and impactful moments, featuring highlights from the conversations that shaped the industry. Join us as we dive into the ambitious goal of achieving 1 million RESNET® HERS ratings annually, explore the evolution of new QA tools, and unpack the growing Build-to-Rent housing trend. We'll also reflect on RESNET®'s strides in water and carbon efficiency programs and the game-changing advocacy efforts that have amplified energy efficiency policies nationwide. Below is a table of all the topics in this episode with links to the full podcasts. Reflection on 2024: As we recap the highlights, we celebrate the progress and resilience of the RESNET® community in driving energy efficiency and sustainability. This episode highlights the industry's achievements and innovation, from RESNET®'s ambitious goals and policy wins to groundbreaking tools like the QA app and evolving market trends like Build-to-Rent housing. Looking Ahead: 2025 promises to bring even more exciting initiatives, broader adoption of RESNET® programs, and a collective journey toward achieving 1 million HERS ratings annually. Don't miss the opportunity to engage with RESNET® through upcoming events like the RESNET® 2025 Conference and continued advocacy efforts. Call to Action: If this episode inspired you, revisit the full conversations for deeper insights and ideas. We thank our listeners, guests, and stakeholders for their contributions to an impactful year. Here's to even greater achievements in 2025! Topic & Episode Topic Full Episode link Part A GOALS & MACRO TRENDS 124 Ambitious goals 2028 www.bit.ly/RT-124 124 Stretch goals www.bit.ly/RT-124 128 Defining BTR & HERS Ratings www.bit.ly/RT-128 Part B SOFTWARE & DATA 125 Development of the QA app www.bit.ly/RT-125 134 Purpose and goals for QA App www.bit.ly/RT-134 131 Highlights from the Trends report www.bit.ly/RT-131 132 QA is the backbone www.bit.ly/RT-132 Part C PEOPLE 126 Perspectives on Women in the HERS industry www.bit.ly/RT-126 127 Robert's advocacy experience www.bit.ly/RT-127 127 Carl on how advocacy days are structured www.bit.ly/RT-127 129 Paulette explains her role in outreach www.bit.ly/RT-129 133 Evolution of IECC/HERS Compliance Specialist www.bit.ly/RT-133 130 Preview of the 2025 RESNET Conference www.bit.ly/RT-130 To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Without continual growth and progress, such words as improvement, achievement, and success have no meaning. – Benjamin Franklin In this episode of RESTalk, Bill Spohn speaks with three guests to discuss the RESNET® QA app, an innovative tool designed to streamline the quality assurance process for home energy ratings. Bill is joined by Billy Giblin, RESNET®'s quality assurance field specialist, Kelvin Abong and Kiro Bondarev from The Fourth Dimension (4D), RESNET®'s technology partner. Together, they explore the app's journey from ideation to execution, its role in harmonizing quality assurance efforts, and the benefits it brings to users. Billy provides insight into the app's development, explaining how it shifts the industry from inconsistent spreadsheet-based reviews to a more transparent, data-driven approach. Kelvin discusses 4D's collaborative efforts to design a user-friendly app and highlights future improvements based on user feedback. Kiro offers technical details describing the app's ability to integrate with external QA tools via an API, ensuring flexibility for providers. The conversation touches on the importance of industry adoption, the app's intuitive design, and its positive impact. The episode includes exciting news about upcoming features, including cloud syncing and expanded integration with Energy Star. Here is a link to RESTalk EP125, where we first introduced the QA App on this podcast https://restalk.libsyn.com/ep125-qa-in-your-pocket-how-a-mobile-app-is-empowering-resnet-hers-ratings-with-cassandra-wright-leo-jansen-and-billy-giblin A press release on the RESNET® QA App, including links to download the app: https://www.resnet.us/articles/resnet-launches-new-qa-app-for-resnet-rating-providers-and-qads/ More details on the RESNET® QA App: https://www.resnet.us/about/qa/resnet-qa-app/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Scott Doyle is the managing director of quality assurance for the Residential Energy Services Network, or RESNET. RESNET governs the home energy rating system or the HERS® Energy Rating Index. Scott became a HERS® Rater in 2003 and since being certified has inspected and tested the energy performance of thousands of homes across a variety of construction types and climate zones. In 2007 he became a RESNET certified Quality Assurance Designee and Trainer and oversaw quality assurance activities for EnergyLogic and five other RESNET HERS® Rating Providers. This involved quality assurance oversight of nearly 10,000 homes annually. Scott's next move was becoming the quality assurance manager at RESNET itself and he has been one of the guiding influences in protecting the RESNET brand and the home energy rating industry in general. As you will hear, I was one of Scott's trainers and employers which made this conversation even more enjoyable as it has been supper fun to watch Scott's carrier evolve over the years. Scott Dyle on LinkedIn Residential Energy Services Network / RESNET
Knowledge is of no value unless you put it into practice. Anton Chekhov In this episode of the RESTalk podcast, host Bill Spohn talks with Mark Johnson from the International Code Council (ICC) and Steve Baden from RESNET® about a new collaboration to enhance energy code compliance. They discuss the growing complexities of energy codes, particularly the challenge of ensuring compliance amid increasing technical and regulatory demands. To address this, ICC and RESNET® are introducing a program to involve HERS raters as third-party verifiers, providing quality assurance and supporting local code officials, especially where resources are limited. The conversation delves into the origins and development of this program, which has been over a decade in the making. Steve and Mark describe their journey, starting with joint training initiatives and establishing standards such as the Energy Rating Index (ERI). The latest effort includes certification of RESNET® HERS® raters to act as recognized compliance specialists under the International Energy Conservation Code (IECC). This certification ensures that raters understand building science and are well-versed in energy code requirements, enhancing credibility with code officials. Mark and Steve emphasize the need for collaboration and transparency in implementing these new compliance measures. They highlight that this partnership helps jurisdictions struggling with limited code enforcement capacity and provides career growth opportunities for HERS raters. The program aims to be fully launched by the end of 2025, with both guests encouraging raters to obtain their certifications soon to position themselves at the forefront of this compliance evolution. Here's the link to the ICC website with all of the info on the IECC/HERS Compliance Specialist Designation: https://www.iccsafe.org/content/ecs-designation/ Flyer with a quick summary: https://cdn-www-v2.iccsafe.org/wp-content/uploads/ICC_HERSComplianceSpecialist_MessagingToCodeOfficials_infographic-002.jpg.webp To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Quality is never an accident; it is always the result of intelligent effort. John Ruskin In this episode of RESTalk, host Bill Spohn is joined by Scott Doyle and Ryan Meres from RESNET® to discuss their ambitious goal of achieving one million ratings annually by 2028. The conversation begins with an overview of RESNET®'s significant progress, which recorded 360,000 ratings in 2023—a 136% increase since 2013. The discussion highlights the importance of quality assurance in maintaining the credibility of the HERS rating system, particularly as RESNET® scales its operations and engages more stakeholders. Scott Doyle, the Managing Director of Quality Assurance at RESNET®, emphasizes the critical role of quality assurance in ensuring consistency and reliability in the HERS rating process. This is especially important as RESNET's® influence grows, necessitating the trust of external stakeholders. Ryan elaborates on how the RESNET® self-funding model, primarily through fees for homes submitted to their registry, has allowed the organization to expand its staff and services since 2017 without relying on government grants. The episode also explores opportunities to expand HERS® ratings in new and existing home markets. Ryan and Scott discuss how federal tax credits, builder incentives, and new technologies drive interest in HERS® ratings, particularly among large national production builders. They also touch on the potential for growth in the existing homes market, especially among companies that manage large portfolios of older properties. The conversation concludes with a call to action for HERS® raters to build capacity in anticipation of the growing demand for energy, water, and carbon ratings and energy codes services and a reminder of the importance of attending the upcoming RESNET® conference to stay informed and connected in the industry. Link to the 2025 RESNET® Conference: https://whova.com/web/O69mUZ3%40Ukqwwk5w4pS%40yN-VrNyD68R4WHS7uRqUeos%3D/ To the RESNET® community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
AI Engineering is expanding! Join the first
On today's R-Value podcast, IDI's Ken Allison welcomes Nikki Kruger from Santa Fe dehumidifiers. As the industry grapples with the challenges of increasingly airtight homes, Nikki sheds light on the latest innovations in dehumidification technology and their role in maintaining healthy, comfortable living spaces. Nikki brings over 20 years of experience in the indoor air quality (IAQ) industry to her role as Director of Marketing & Business Development at Santa Fe Dehumidifiers. A RESNET certified home energy rater and member of various industry committees, including the ACCA Manual Low Load Homes (LLH) Advisory Committee, Nikki is at the forefront of developing effective and sustainable solutions for ventilation and moisture control in buildings. Ken and Nikki debunk dehumidification myths and reveal surprising insights about current technologies. Nikki's expertise shines as she explains why some popular solutions may not be as effective as commonly believed. As she pointedly states, "Everybody wants to use a technology to dehumidify that's not a dehumidifier... The reality of keeping our mechanical systems separate in order for them to focus and being able to deliver what we actually need is the simplest solution for the HVAC community." In this episode: The impact of tighter building envelopes on indoor air quality and the need for mechanical ventilation Strategies for managing humidity in energy-efficient homes, including whole-house dehumidifiers The importance of proper HVAC system sizing in modern, well-insulated buildings Challenges with exhaust-only ventilation systems and the benefits of supply ventilation Discussion of dehumidification needs in various climate zones and building types, including Passive Houses The role of occupant behavior in managing indoor humidity and comfort Sizing considerations for dehumidifiers in different applications, such as crawl spaces and living spaces
The greatest value of a picture is when it forces us to notice what we never expected to see. John Tukey In this episode of RESTalk, host Bill Spohn welcomes his most frequent guest, RESNET staff member, Ryan Meres, to discuss the latest trends in the 2024 edition of the HERS-rated homes. This report, created as an initiative of the RESNET Suppliers Advisory Board, highlights data trends and analysis in the residential construction market. The report shows a consistent year-over-year increase in HERS ratings, with the total number last year reaching 362,000, and projections for this year indicating that the number may surpass 400,000. Notably, states like Massachusetts and Arizona have seen significant adoption rates of HERS ratings in new homes, with Massachusetts leading at 98%. The conversation delves into various aspects of the report, such as the geographic distribution of HERS ratings, the types of ventilation systems used in homes, and the rise of all-electric homes. Meres notes that certain states, like Texas, have seen substantial increases in the percentage of homes receiving HERS ratings. The report also explores energy cost savings, insulation values, and air leakage rates, highlighting that homes with solar panels tend to have better insulation compared to those without. Additionally, the trend towards high-efficiency mechanical equipment and the decreasing use of gas heating and water heating systems is discussed, reflecting a gradual shift towards more energy-efficient building practices. Finally, the podcast touches on the growing importance of the HERSH2O rating system, which measures water efficiency in homes. Ryan explains that the program has been expanding, particularly in the Southwest, and is beginning to gain traction in other regions as well. The report indicates that HERSH2O rated homes can save significant amounts of water annually. The conversation concludes with a reflection on the value of the data provided in the report, emphasizing its role in helping builders and decision-makers understand and implement best practices for achieving energy and water efficiency in residential construction. Link to the report: https://www.resnet.us/wp-content/uploads/RESNET_2024_HERSTrendsDataReport_FINAL.pdf Ryan's LinkedIn: https://www.linkedin.com/in/ryan-meres-58977110/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Great things in business are never done by one person. They're done by a team of people. Steve Jobs Bill Spohn welcomes RESNET staff member Clara Hedrick to discuss the upcoming RESNET 2025 conference. Clara, the lead events coordinator for RESNET, shares her excitement about the event and details her responsibilities. The conference is scheduled for January 27-30, 2025, in Tempe, Arizona, a location praised for its accessibility, nightlife, and natural beauty. Clara highlights the venue, Tempe Mission Palms, chosen for its amenities and feedback from previous events. With a goal of 500 attendees, Clara anticipates a sellout crowd and emphasizes the importance of early registration. She stresses that early registration is crucial to secure accommodation and participation, as the event is expected to be in high demand. The conference theme celebrates 30 years of RESNET and 4 million homes rated. Clara outlines the general schedule, starting with an opening reception sponsored by NAIMA on January 26. The following days will feature general sessions, panel discussions, and breakout sessions focusing on various themes, including building science, energy codes, and EPA programs. New session tracks like RESNET 101 and separate business and workforce development sessions aim to provide comprehensive insights for both newcomers and seasoned professionals. Special events, such as offsite tours and training sessions, will take place later in the week, providing educational and networking opportunities. Clara also discusses the exhibit hall, which will be adjacent to the general session room, offering booth spaces and tabletops for exhibitors. The conference aims to balance formal sessions with ample networking time, ensuring attendees can connect with peers and industry leaders. This is a unique opportunity to forge new connections and strengthen existing ones. Clara encourages listeners to stay informed through the RESNET mailing list and highlights the importance of feedback in shaping future events. The episode concludes with Clara expressing her passion for the event and her commitment to making it a success, thanking the RESNET community for their continued support and engagement. Link to the conference RSVP form: https://www.resnet.us/2025-resnet-conference-rsvp/ Contact the conference team at: conference@resnet.us RESNET Newsletter sign up (the KEY to so much info): https://signup.e2ma.net/signup/1878040/1889360/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
"In the end, we will conserve only what we love; we will love only what we understand; and we will understand only what we are taught." — Baba Dioum In the latest episode of RESTalk podcast, Bill Spohn hosts Paulette McGhie and Ryan Meres to discuss RESNET®'s innovative approaches to water, carbon, and building codes. Paulette McGhie, who recently joined RESNET, shares her extensive background in energy compliance and her passion for energy transparency, driven by a personal experience involving her mother's high utility bills. She now focuses on outreach, education, and advocacy, aiming to promote net-zero energy homes by 2040. The discussion highlights RESNET®'s recent activities, including a significant meeting with Utah's water conservation board to address water reduction goals. Paulette emphasizes the importance of collaborative efforts among key stakeholders to create incentives and recognition programs for builders adopting water-efficient practices. Ryan Meres discusses RESNET®'s recent policy forum in Washington, D.C., where they advocated for a federal tax credit for water efficiency, similar to the existing 45L tax credit for energy-efficient homes. The podcast also covers the challenges in promoting water efficiency and adopting advanced building codes. Paulette and Ryan acknowledge the difficulties builders face, particularly in regions with strict outdoor water use regulations. They stress the need for continuous education and advocacy to overcome these obstacles. Both guests are optimistic about increasing builder participation in RESNET® programs and establishing a robust energy code compliance program, aiming for significant progress in the next year. Paueltte's LinkedIn: https://www.linkedin.com/in/paulette-mcghie-b4675516/ Ryan's LinkedIn: https://www.linkedin.com/in/ryan-meres-58977110/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Standard 310 is a technical workflow created by ACCA, ResNet, and ANSI for grading the installation of HVAC systems, typically in new home construction. It plays a crucial role in obtaining Energy Star certification, which can qualify homeowners for tax credits under the Inflation Reduction Act. The five steps of Standard 310 are design review, duct leakage test, total system airflow, blower fan watt draw, and refrigerant charge verification. In this podcast episode, host Bryan Orr is joined by guests Chris Hughes and Eric Kaiser to discuss Standard 310 and its implications for HVAC contractors. The standard aims to ensure that HVAC systems are installed correctly and operate as designed. The process involves a third-party HERS rater conducting various tests and measurements, which contractors need to be prepared for. Proper duct sealing, airflow settings, and refrigerant charging are critical for passing the assessments. One of the challenging aspects highlighted is the refrigerant charge verification step. The standard requires either non-invasive testing (which has temperature limitations) or weigh-in verification with geotagged photos. Chris Hughes suggests manufacturers could develop more consistent commissioning protocols to streamline this process. Topics covered in the podcast: Overview of Standard 310 and its five steps Importance for Energy Star certification and tax credits Role of HERS raters and HVAC contractors Duct leakage testing and proper sealing Airflow measurement methods Blower fan watt draw challenges Refrigerant charge verification options Need for consistent commissioning protocols Coordination and documentation required Future improvements to the standard Have a question that you want us to answer on the podcast? Submit your questions at https://www.speakpipe.com/hvacschool. Purchase your virtual tickets for the 5th Annual HVACR Training Symposium at https://hvacrschool.com/Symposium24. Subscribe to our podcast on your iPhone or Android. Subscribe to our YouTube channel. Check out our handy calculators here or on the HVAC School Mobile App for Apple and Android.
"Build-to-rent isn't just about providing a place to live; it's about crafting communities where flexibility and convenience meet modern living standards." In this episode of the RESTalk podcast, host Bill Spohn welcomed returning guest Laurel Elam and new participant Thomas Cochran. The discussion primarily centered on the burgeoning trend of build-to-rent (BTR) housing. Thomas, who serves as the Senior Vice President of National Business Development and Marketing at ARCXIS, shared insights on his role which involves strategic partnerships and initiatives that span regional to national scales, focusing lately on sustainability and energy efficiency in building projects. The conversation delved into the specifics of the build-to-rent sector, which Thomas described as detached multifamily housing projects or "horizontal apartments." These developments are typically managed by property management firms after being constructed by home builders. Laurel, who has been actively presenting with Thomas on this topic, mentioned RESNET's involvement through a newly formed advisory group aimed at aligning the rating industry with the build-to-rent movement. This initiative reflects a significant opportunity for energy rating companies to contribute to this growing sector. Further, both speakers discussed the practical applications and implications of the build-to-rent model. They emphasized the importance of energy ratings not only for individual homeowners but also for institutional investors and property managers, highlighting the broad appeal and utility of the HERS (Home Energy Rating System) index in these projects. The podcast touched on the geographic expansion of build-to-rent projects, particularly noting active markets in Texas, Arizona, Florida, and Georgia, and how these projects cater to a diverse demographic, including younger generations and retirees looking for flexibility and convenience in housing. Thomas' LinkedIn: https://www.linkedin.com/in/thomas-c-549bab41/ Laurel's LinkedIn: https://www.linkedin.com/in/laurel-elam-5404817/ Related articles: https://www.resnet.us/articles/build-to-rent-housing-sees-dynamic-growth-in-2023/ https://www.resnet.us/articles/resnet-appoints-btr-advisory-group-arcxis-thomas-cochran-as-chair/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
"Advocacy is the mirror reflecting the voices of the community into the corridors of power." - Unknown In today's podcast we cover the topic of advocacy and public policy in the context of the residential energy sector learning from our guests Robert Pegues, General Manager of Technical Delivery at US Ecologic, and RESNET board member, and Carl Chidlow, a lobbyist representing RESNET. Robert shares his background in the sector, highlighting his experience at last year's policy forum and his involvement with regional and national energy councils. Carl provides insights into RESNET's efforts to engage with policymakers through the RESNET Policy Forum, emphasizing the importance of practitioner involvement in legislative processes and the positive outcomes from bipartisan support on issues like the 45L tax credit and VA home loans. Our main discussion revolves around the upcoming 2024 RESNET Policy Forum, where industry professionals will convene to advocate for energy efficiency and housing policies. Carl explains the event's objectives and the practical aspects of participating, such as scheduled meetings with Congress members and the importance of constituency in policy advocacy, a truly turnkey process for the participant. Our narrative underlines the significance of direct engagement in shaping policies that affect the residential energy sector. The conversation also touches on the practical and personal aspects of participating in such an event, with Robert sharing his initial apprehensions and eventual satisfaction from influencing policy and advocating for industry concerns. Both Carl and Robert encourage listeners to participate in the policy forum, highlighting the opportunity to affect change and the foundational American right to petition the government. The episode concludes with a call to action for listeners to consider attending the policy forum and engaging in the democratic process to advocate for their industry and interests. Robert's LinkedIn: https://www.linkedin.com/in/robert-pegues-785622a7/ Carl's LinkedIn: https://www.linkedin.com/in/carl-chidlow-7816ba37/ Link to the 2024 Policy Forum: https://www.resnet.us/2024-policy-forum/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
"I never dreamed about success. I worked for it." - Estée Lauder We welcome Sharla Riead, Lead Instructor at Energy Smart Institute and Emelie Cuppernell Glitch, VP Programs at Performance Systems Development to focusing on the recognition of women in the HERS (Home Energy Rating System) rating industry. Sharla shares her extensive background, beginning with founding an energy auditing company in 1979, which evolved into a HERS rating company, and subsequently into roles in quality assurance and training within the industry. She emphasizes her company's impact and the shift she has observed towards greater female involvement in the sector. Emelie details her journey in the industry, from her initial interest in science and residential energy to her current role as vice president of Performance Systems Development. She discusses the challenges she faced as a woman in the field, including overcoming assumed biases and the importance of establishing credibility. She reflects on the changing landscape for women in the industry, noting improvements and sharing a personal anecdote illustrating past gender assumptions. The conversation concludes with advice for women entering the industry, highlighting the importance of continuous learning, self-trust, and getting outside one's comfort zone. Both guests underline the evolving nature of the industry, noting an increase in female participation, the growing list of role models and encouraging more women to join. They stress the value of diversity and the need for different perspectives in building science and energy efficiency, illustrating the industry's ongoing transformation and the significant contributions of women. Links To Sharla and Emelie on LinkedIn are below along with a Press Release from the RESNET's Inaugural Class Recognizing Women Pioneers in the HERS Industry during the 2023 conference held in San Diego, CA. https://www.linkedin.com/in/sharla-riead/ https://www.linkedin.com/in/emelie-cuppernell-glitch-a8647b26/ https://www.resnet.us/about/inaugural-class-of-resnet-recognition-of-women-pioneers-in-hers-industry/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
The best way to predict your future is to create it. - Abraham Lincoln Data is king, they say. How can the RESNET Quality Assurance (QA) app's data insights revolutionize how we rate energy-efficient homes? Imagine a world where getting your home energy rating was made easier by using a smartphone app. Is that the future the RESNET QA app promises? Our trio of guests, Quality Assurance Designees (QADs) Cassandra Wright from Strand Systems and Leo Jansen from Energy Efficient Homes Midwest, and RESNET Quality Assurance Field Specialist Billy Giblin, help us to understand the development and application of the new RESNET QA app, a tool designed to improve the quality and consistency of home energy ratings. Here are some key points: Background: The QA checklist was created to address concerns about inconsistencies in quality assurance (QA) reviews. The checklist evolved through several versions and became a mandatory provider requirement. The need for better data access and tracking led to development of the QA app. The App: Available on the Apple Store, Google Play, and as a web app. Streamlines the QA process by pulling data from the RESNET Buildings Registry. Saves time and reduces manual entries. Allows providers to control access for their QADs. Future Plans: Integrate with other energy efficiency programs like Energy Star Indoor airPlus and Zero Energy Ready Homes. Create a forum for QADs to ask questions and share best practices. Develop dashboards for providers and QADs to track their QA and performance. Implement an API for providers who want to use their own tools for QA. Benefits: Improves transparency and consistency in QA reviews. Provides valuable data for tracking and improving performance. Reduces the workload for QADs and providers. Challenges: Providers who use a retroactive QA model may need to adjust their process. Learning curve for using the new app and API. Overall, the RESNET QA app is a positive step towards improving the quality and consistency of home energy ratings. Everyone will need time to adjust to the new system, but the potential benefits are significant. Link to RESNET site with info on the QA app: www.resnet.us/about/qa/resnet-qa-app/ RESTalk: To the RESNET community, we hear you and want to engage. Learn more at www.RESNET.us. Or for more info on this topic, contact RESNET at INFO@RESNET.US
Today's episode is all about moisture mitigation in our homes. We all know that excess moisture can lead to mold, but its also responsible for increased chemical offgassing. This show for everyone, even if you think you live in a dry climate and dont have anything to worry about. Nikki Krueger is the Director of Marketing & Business Development Manager for Santa Fe freestanding and whole house ventilating dehumidifiers. She has been involved in the indoor air quality industry for over 20 years. She is a RESNET certified home energy rater, a member of the ACCA Manual Low Load Homes (LLH) Advisory Committee, sits on the building envelope committee for the Spray Polyurethane Foam Association, has completed the ACCA Residential Design for Quality Installation Certification, and is serving on the 2024 National Green Building Standard update committee. She educates homeowner, builders, HVAC contractors, architects, engineers, crawl space contractors, and other professionals in the industry on the building science of ventilation and moisture control in buildings.
In this episode, we engage in an insightful conversation with Ryan Rice, discussing his journey in the oil and gas industry and the innovative work at Resnet, a software and services startup focused on revolutionizing production operations and reservoir engineering.This episode dives into:Introduction to Resnet: Exploring the mission and services of Resnet, which aims to unite field and office operations through innovative software solutions and technical services.Well Tender - Field Service Management: Discussing Resnet's well tender field service management and field dispatching solution, designed to optimize production operations by ensuring efficient job allocation and safety.Gamification in Operations: The unique approach of incorporating gamification into operational processes to enhance engagement and efficiency.Intelligent Flow Control Services: Delving into Resnet's specialized services in reservoir engineering, particularly in conducting advanced well testing campaigns to understand well communication and influence asset development programs.Building a Tech-Driven Business: Ryan Rice shares insights on self-funding Resnet, balancing software development with consulting services, and the importance of cash flow in a startup.Evolution of Resnet's Vision: The journey from the initial concept of production as a service to the current focus on addressing communication and data access challenges in the oil and gas industry.Influence of Salesforce on Resnet: How the adoption of Salesforce at Rice Energy inspired the development of Resnet's platform, emphasizing the need for efficient and integrated business operations.Ryan Rice's Background and Career Path: From his early days working in the field at Rice Energy to pursuing petroleum engineering and eventually co-founding Resnet.The Future of Oil and Gas Technology: Discussing the evolving landscape of the oil and gas industry and the role of technology in driving future developments.This episode offers a deep dive into the innovative world of oil and gas technology, highlighting Ryan Rice's unique perspective and the transformative work being done at Resnet.
This episode is part of our Skilled Labor Series hosted by MCJ partner, Yin Lu. This series is focused on amplifying the voices of folks from the skilled labor workforce, including electricians, farmers, ranchers, HVAC installers, and others who are on the front lines of rewiring our infrastructure.David Holtzclaw is the founder and principle of Transduction Technologies, a small engineering firm based out of Omaha, Nebraska that provides science analysis, testing, and energy consulting services to residential and small commercial clients. In this episode, we are talking about weatherization and home energy efficiency.David and his team perform a number of services including energy evaluations, duct leak testing, ventilation testing, pressure mapping, combustion testing, infrared imaging and cost benefit analysis of implementing renewable energy systems as a whole. We discuss how the home energy efficiency market has grown over the past few decades, the top things you can do to your home to improve your energy efficiency, and both the tail and headwinds the IRA bill is bringing to consumers and contractors alike in Nebraska.In this episode, we cover: [03:11]: Origin of home energy auditing in the 1980s and creation of ResNet[05:29]: Home Energy Score (HES) for existing homes, Home Energy Rating System (HERS) for new homes[07:23]: ResNet's relationship with BPI (Building Performance Institute)[09:04]: Emergence of the first energy code for new construction, the IECC (International Energy Conservation Code)[11:17]: The impact of high interest rates on the demand for energy audits[14:47]: David's transition from aerospace and NASA to founding an energy efficiency company[20:43]: An overview of his customer base[24:27]: The main culprits of an energy-inefficient home[29:45]: David's approach to customizing homes during the design process[32:11]: Insights into mechanical ventilation[34:30]: How upfront investments like triple pane windows pay off[38:50]: Why cheaper heat pumps may be pushed over better models with the IRA[42:08]: The impact of politics on state energy efficiency funding[49:22]: Advice and cautions for listeners planning to electrify and weatherize their homes.Get connected: David Holtzclaw LinkedInYin X / LinkedInMCJ Podcast / Collective / Instagram*You can also reach us via email at info@mcjcollective.com, where we encourage you to share your feedback on episodes and suggestions for future topics or guests.Episode recorded on Aug 17, 2023 (Published on Oct 12, 2023)