POPULARITY
Richard Gearhart and Elizabeth Gearhart, co-hosts of the Passage to Profit Show interview AI and 3D tech leader James Thornton from Tafi and Daz 3D, franchise expert Cliff Nonnenmacher from Franocity and cybersecurity expert Eric Kanagy from Simplesense. James Thornton, Co-Founder and CEO of Tafi and Chairman & CEO of DAZ 3D, reveals what it really takes to build billion-dollar companies, why most entrepreneurs misunderstand scaling, and why data—not AI models—is becoming the true power behind the future of artificial intelligence. In this inspiring and deeply personal episode, James shares lessons from rebuilding struggling companies, surviving a life-changing stroke in his twenties, creating industry-leading 3D AI technology, and helping shape the next generation of AI-driven business tools. From prompt engineering and AI workflows to resilience, leadership, innovation, and the future of digital humans, this episode delivers powerful insights every entrepreneur, creator, and business leader needs to hear. Read more at: https://www.daz3d.com/ Franchise expert Cliff Nonnenmacher, founder of Franocity reveals what most people completely misunderstand about franchising, wealth creation, and escaping corporate America. In this eye-opening episode, Cliff explains how the right franchise can dramatically reduce business failure risk, why “freedom within the framework” creates successful entrepreneurs, and the critical financial and personality traits needed before investing. He also breaks down the industries he believes are most resistant to AI disruption — including home services, trades, senior care, biohacking, and youth enrichment — while sharing the biggest mistakes aspiring franchise owners make when chasing passive income and financial freedom. Read more at: https://franocity.com/ Cybersecurity expert and SimpleSense founder Eric Kanagy reveals how AI is rapidly changing the future of cyber warfare, infrastructure security, and online safety. From hacked water utilities and nation-state attacks to AI-generated scams and fake voices, this eye-opening conversation explores the growing threats businesses and everyday people face as artificial intelligence becomes more powerful. Eric explains why critical infrastructure is vulnerable, how AI is helping both attackers and defenders, and what entrepreneurs can do now to stay protected in an increasingly dangerous digital world. Read more at: https://simplesense.io/ Whether you're a seasoned entrepreneur, startup founder, inventor, or small business owner, the Passage to Profit Show is a leading podcast for insights on entrepreneurship, innovation, intellectual property and business strategy. Hosted by Richard Gearhart and Elizabeth Gearhart, the show features industry leaders, investors, and founders who share real-world lessons on scaling companies, protecting ideas, building generational wealth, and navigating today's evolving business landscape. Visit https://passagetoprofitshow.com/ for the latest episodes, expert interviews, and resources designed to help you grow, protect, and profit from your ideas. Chapters (00:00:00) - Passive Intelligence: The Future of Business(00:00:25) - Passage to Profit(00:02:13) - We Got Our Patent Granted(00:02:51) - If You Filed Your Return Late, You Can Get a Ref(00:03:48) - A Few Words on Ted Turner(00:04:38) - Jimi Hendrix Legacy Lawsuit(00:05:55) - Mother's Day Plans in New York(00:07:41) - What Was the One Decision That Changed the Direction of Your Business?(00:08:41) - How to Build a Wealth of Franchising(00:10:23) - What Changed the Direction of Your Business?(00:12:18) - The One Decision That Changed the Direction of Your Business(00:15:02) - How Hard Do You Have to Work to Create a Billion-D(00:15:59) - Clifford Robbins on Working Nonstop(00:19:58) - How Having a Stroke Changed My Perspective on Life(00:22:43) - The true power of AI is data(00:25:34) - How to Describe Yourself to the AI(00:28:32) - Car Shield(00:29:43) - Better Health Insurance for You(00:30:43) - How Daz For 3-D Artists Is Taking on AI(00:40:24) - Best Uses of AI in Business Owners Roundtable(00:42:38) - ChatGPT: The Future of Image Generation(00:44:19) - Business Owners Roundtable: Real AI Use Cases(00:46:02) - Debtor Assistance Hotline(00:48:27) - The Secret to Intellectual Property(00:52:00) - Buy a Franchise(00:54:59) - How to Get Out of Corporate America(00:56:36) - Do You Need a Franchise to Create Wealth?(01:00:59) - Should You Buy a Franchise or Start a Business?(01:03:19) - What to Know Before Becoming a Franchisee(01:04:10) - Immortal Franchising: The furthest distance from AI(01:06:20) - Is Cybersecurity More Secure Than Ever?(01:12:25) - James Poneman: Could AI Prevent Cybersecurity Attacks?(01:19:02) - Car Shield(01:20:06) - Memory of the Phone(01:21:22) - Secret Weapons of the Entrepreneurial Mind(01:24:05) - How to Be More Helpful to Others(01:25:12) - Richard Gearhart and Elizabeth Gearhart: Rest Is Not Optional(01:26:50) - Passive to Profit
Hey ya'll, Alex here with your weekly AI news catch up. It's one of those Thursday's where no matter how well I prep, the big AI labs are hell bent to show up before each other. Alibaba dropped Qwen 3.6 with Apache 2, confirming their commitment to Open Source, then Anthropic released Claude Opus 4.7 (not quite Mythos) and OpenAI followed with a huge Codex update that includes Computer Use among other things. The highlight of Computer User is the background usage, more on that below. This is all just from today!Previously in the week we had 2 incredible 3D world generators, Lyra 2.0 from Nvidia and HYWorld 2 from Tencent, Windsurf dropping 2.0 version with Devin integration and Google releasing a Gemini TTS, with over 90+ languages support and incredible emotions range, and Baidu open sources Ernie Image, rivaling Nano Banana. Today on the show we had 3 awesome guests, Theodor from Cognition joined to cover the new Windsurf, Kwindla is back on the show to talk about “the side project that escaped containment” Gradient-Bang, a multi agent, voice based space game and Trevor from Marimo joined to talk about pairing your agents with a Marimo notebook. Let's dive in!
Mistral has been on an absolute tear - with frequent successful model launches it is easy to forget that they raised the largest European AI round in history last year. We were long overdue for a Mistral episode, and we were very fortunate to work with Sophia and Howard to catch up with Pavan (Voxtral lead) and Guillaume (Chief Scientist, Co-founder) on the occasion of this week's Voxtral TTS launch:Mistral can't directly say it, but the benchmarks do imply, that this is basically an open-weights ElevenLabs-level TTS model (Technically, it is a 4B Ministral based multilingual low-latency TTS open weights model that has a 68.4% win rate vs ElevenLabs Flash v2.5). The contributions are not just in the open weights but also in open research: We also spend a decent amount of the pod talking about their architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens (typically only applied in the Image Generation space, as seen in the Flow Matching NeurIPS workshop from the principal authors that we reference in the pod).You can catch up on the paper here and the full episode is live on youtube!Timestamps00:00 Welcome and Guests00:22 Announcing Voxtral TTS01:41 Architecture and Codec02:53 Understanding vs Generation05:39 Flow Matching for Audio07:27 Real Time Voice Agents13:40 Efficiency and Model Strategy14:53 Voice Agents Vision17:56 Enterprise Deployment and Privacy23:39 Fine Tuning and Personalization25:22 Enterprise Voice Personalization26:09 Long-Form Speech Models26:58 Real-Time Encoder Advances27:45 Scaling Context for TTS28:53 What Makes Small Models30:37 Merging Modalities Tradeoffs33:05 Open Source Mission35:51 Lean and Formal Proofs38:40 Reasoning Transfer and Agents40:25 Next Frontiers in Training42:20 Hiring and AI for Science44:19 Forward Deployed Engineering46:22 Customer Feedback Loop48:29 Wrap Up and ThanksTranscriptswyx: Okay, welcome to Latent Space. We're here in the studio with our gues co-host Vibh u. Welcome. Thanks. Excited for this one as well as Guillaume and Pavan from Mistral. Welcome. Excited to be here.Guillaume: Thank you.swyx: Pavan, you are leading audio research at Mistral and Guillaume, you're Chief Scientist,Announcing Voxtral TTSswyxHost(00:05) Okay. (00:05) Welcome to Lean Space. (00:06) We're here in the studio with trustee co-hosts, Vibhu. (00:09) Welcome.VibhuHost(00:11) Very excited for this one.swyxHost(00:12) As well as Guillaume and Pavan from Mistral. (00:15) Welcome. (00:16) Excited to be here. (00:17) Thank you for having us.(00:18) Pavan, you are leading audio research at Mistral and Guillaume, you're a chief scientist. (00:23) What are we announcing today where we're coordinating this release with you guys?GuillaumeGuest(00:26) Yeah, so we are releasing Voxtral TTS. So it's our first audio model that generates speech. It's not our first audio model. We had a couple of releases before.(00:35) We had one in the summer that was Voxtral, our first audio model, but it was like a transcription model, ASR. Like a few months later, we released some update on top of this, supporting more languages. Also a lot of table stack features for our customers, context biasing, precision, timestamping and transcription. We also have some real-time model that can transcribe not just at the end of the level.(00:56) You don't need to fill your entire audio file, but that can also come in real-time. And here, this is a natural extension in the audio, so basically speech generation. So yeah, so we support nine languages, and this is a pretty small model, 3D model, so very fast, and also state of the art. Performed at the same level as the base model, but it's much more efficient in terms of cost, and also much, in terms of cost, it's also much cheaper, only a fraction of the cost of our competitors.(01:22) And we are also releasing the work that this model is running.swyx What's the decision factor?Guillaume It's a good question.swyxThere will be more. Yeah, Pavan, any sort of research notes to add on?Architecture and CodecPavan: But it's a novel architecture that we develop inhouse.We traded on several internal architectures and ended up with a auto aggressive flow matching architecture. And also have a new in-house neural audio codec. Which, converts this audio into all point by herds latent [00:02:00] tokens, semantic and acoustic tokens. And yeah, that's that's their new part about this model and we're pretty excited that it's, it came out with such good quality and Jim was mentioning. Yeah, it's a three B model. It's based off of the TAL model that we actually released just a few months back and insert trunk and mainly meant for like the TTS stuff, but they need text capabilities are also there. Yeah.swyx: So there's a lot to cover.I always I love any, anything to do with novel encodings and all those things because I think that's obviously I creates a lot of efficiency, but also maybe bugs that sometimes happen. You were previously a Gemini and you worked on post training for language models, and maybe a lot of people will have less experience with audio models just in general compared to pure language.What did you find that you have to revisit from scratch as you joined this trial and started doing this? At leastUnderstanding vs GenerationPavan: when it comes to, for, I think the, there are two buckets, I guess the audio understanding and audio [00:03:00] generation. The audio understanding, like the walkthrough models that Kim was mentioning that we released earlier.The walkthrough chat that we released I think July last year, and the follow up transcription only, models family that we released in January, that would be one bucket, and the generation is another bucket. I think. You can also treat them as a unified set of models, but currently the approaches are a little different between these two.To your question on how audio is fed to the model? In the understanding model, it's very similar to actually Pixar models that we also released,swyx: yes.Pavan: That'sswyx: amazing.Pavan: It was pretty, I, that was the first project I worked on after joined Misra. It was pretty, pretty nice. And Wtu was very similar in spirit.I guess So we feed audio through an audio encoder similar to images through a vision encoder, and it produces continuous embeddings and which are fed as tokens to the main transformer decoded transformer model. Yeah. On the model output is just text. So on the output side, there is nothing that needs to be done in these kinds of mode.I [00:04:00] guess the interesting part of what the generation stuff is, the output now has to produce audio and. The approach that we have is this neural audio codec, which converts audio into these latent tokens. There is a lot of existing attrition and a lot of models which are based off of this kind of approach.And we took a slightly. A different, design decisions around this. But at the end of the day, the neural audio product converts audio into a 12.5 herdz set of latents. And each latent is, has a semantic token and a set of acoustic tokens. And the idea is that you take these discrete tokens and then feed it on the input side.There's several ways to use this at each frame, but we just sum the embedding. So it's like having key different vocabularies. Combine all of them because they all correspond to one audio frame on the input side. The output side is the interesting part on the output side, the, it's not the, I don't know if it's the most popular, but one.Popular technique is to have a depth transformer [00:05:00] because you have K tokens at each time step, like with a text, you just have one token at each time step. So you just do predict the token from the vocabulary with, yeah, with just, you get probabilityswyx: This's a very straightforward text. VeryPavan: straightforward.swyx: Yeah.Pavan: But if you have K tokens, then the name thing would be to predict all of them in paddle. That doesn't work. At least that doesn't work that well because audio has more entropy. And the, one of the techniques people use is this depth transformer where you you almost have a small transformer, or it can be L-S-T-M-R in as well, but people use transformers and you predict the K tokens in auto aggressive fashion in that.So you have two auto reive things going on.Flow Matching for AudioPavan: So the thing we did differently is in, instead of having this auto aggressive K step prediction, we have a flow matching model. Instead of modeling this as a discrete token set we trained the codec to be both discrete and continuous to have this flexibility.So we did try the discrete stuff too, and which it works well, but the continuous stuff works just better. So yeah, we took this flow matching, so the, it's a flow [00:06:00] matching head, which takes the latent from the main transformer and like kind in fusion, it's denoising, but in this flow matching itself, velocity estimate.So you go from this noise t all the way to there. Audio latent, which corresponds to the 80 millisecond audio and then, which is sent through the work order to get back the 80 millisecond audio frame.swyx: Yeah. Is this the first application of flow matching in audio? Because usually I come across this in the image.Pavan: Yeah. Actually, in some sense there are models flow matching models in audio, but I think this specific combination I could be wrong. There could be somewhat. No. I haven't seen. I haven't seen much work in this, so I think it's novel and a lot of it's just a way bigger community, so they, I think they pioneer a lot of these diffusion flow matching work, and it's interesting to adopt some of the ideas there into audio and,swyx: yeah.Pavan: Yeah, I'm, personally that's the think part which is trying out about. One of more meta point is unlike text, even in vision, I think this is true, but in [00:07:00] audio step literature that there is no.Winner model, yet there is no, okay, this is the way you do things. It's it's still by, I think people are still iterating and figuring out like what's the best overall recipe. I guess the idea. Pretty sure there are models which are also completely end-to-end, like NATO audio. NATO audio, but it's still not come to a convergence point where this, the right way to think that.That also makes. A space pretty exciting to explore.Real Time Voice AgentsVibhu: What are some of the ways to look at it?Vibhu: There are ways where you can do diffusion for audio generation, but if you want like real time generation, that's a big thing with the approach I'm assuming that you took. Yeah. And also like how do you go about evaluating different axes of what you care about, yeah,Pavan: good point. I think we so you can do just flow matching diffusion for the whole audio. We didn't even go down that path because one of the main applications is voice agents and we want real time streaming, and that's the use case. That's not the only use case, but that's one of the primary use cases we want to get to.So we [00:08:00] picked the auto aggressive approach for that. And within the auto aggressive space, again, you can do chunk by chunk or you can do so we picked the. I think at least personally prefer the operations, which are the simplest, and so we try to see, can we just add audio as just another head to our regular transformer decode model because that kind of makes it easier for eventual end-to-end modeling of audio text native modeling.Yeah. And it works pretty well. So I guess we went with that and we tried a little bit, but the flow matching head itself, like we had a discreet. Diffusion kind of approach, which also works well, but the flow matching work better.swyx: I was just curious about how you also think about this overall direction of research.Do you basically, when you work with the audio team, do you set some high level parameters and then let them explore whatever, or how does it work between you guys?Guillaume: No I think the way it works is that we are the, we are prioritizing together, I think, what are the most important features because there are many things we can do [00:09:00] in audio.Yeah, I think we try to. These are like how we should do things, for instance. Ultimately what we want to do is to build this through duplex model, but we are not going to start this start there directly, I think is. Some of the project people are doing, butswyx: just to confirm, full effects means it can speak while I'm speaking or,Guillaume: yeah.Okay. Audio. Yeah. Yeah. So intimately we're going to get there, but for us it was, we decided to take it like a step by step. So we start with whatever is the most important. I think support customers, which is the transcription is the most popular use case. Then the speech generation, Soviet time, just a bit before that.And then actually to be like more, but try combining everything all together. But but yeah, we thought it was also important to like separate things and optimize each capability one by one before weswyx: measure of that together. And the super omni model. ButGuillaume: very interesting because as Par said, it's when you work on some other domains of this airline and everything, there are many areas where I think it's not as interesting.For instance. Many places, it's essentially just around data or like creating new environments on a lot of kind [00:10:00] of easy things. But things were, I think the research is maybe not as interesting. Were in audio. There are so many ways to actually build this model. So many ways to go around it. That's the sense I think is really interesting.And what we also tried for speed generation is that we tried multiple approaches. What was interesting that even though they were extremely different, they under the big know the particles but the for matching turned out to be quite more natural. So we are happy with this.swyx: Is there intuition why it maybe like flow matching is just models speech better in some natural fundamental, latent dimension?Pavan: No, I think the main thing is e even at a particular time step, there is a distribution of things.swyx: Yes.Pavan: To be predicted like the way you inflate. So you already know the word that you're speaking and Yeah. The intake space, let's say the word maps register a single token for simplicity.In most cases it does. So there is not a lot of so you just pick the word, but with within audio, even the same word could, even with your own voice, could be inflicted in so many different ways. And I think [00:11:00] any approach which like models this distribution and. And flow matching is one, one of the take.It's not the only one at all, but it's a one which works pretty reasonably well. I think that's better. So you have to pick across several different, the intuition I have is it's, there are some, several different clusters each corresponding to some specific way you would inflict, pronounce that thing.And you can't predict the mean of it because that corresponds to some blurred out speech or something like that. But you have to pick one. And then like sharpswyx: conditional inference.Pavan: Yeah, exactly.swyx: Is that all covered under disfluencies, which is I think the normal term of art. Pauses intonations. By the way, I have to thank Sophia for setting all this up, including like some of these really good notes becausePavan: Yeah.swyx: I'm less familiar with the audios for me.Pavan: No. I think dis dismisses are definitely one such Eno defenses is more likeswyx: which is arms are.Pavan: Yeah, arms. And also repeat like you like,swyx: yeah.Pavan: You do this full of words, your thinking, so you repeat the word.swyx: Okay. Whereas intonation is like a diff, it's up up [00:12:00] speak and all this.Okay.Pavan: Yeah. So I think there is a lot of like entropy. And modeling it as a distribution. And a, any technique which helps with it and the depth transformer is a conditional way of modeling this. And Transformers actually really good at it, even though that's a mini transformers. So I think that worked pretty well too for us too.It's just that the main concentration is when you have a depth transformer. If you have K tokens, you need to do K auto steps, right? Even though it's a small thing, it's K steps, which is very vacant, say heavy, but flow matching. We were able to cut it down significantly. So we are able to do the inference in quad steps or 16 steps and it works pretty well.And there are more normal techniques to bring it down even further to like, in extreme case, one step like we're not doing it yet, but it at least the framework, LEDs itself to more efficient and Yes.swyx: And the image guys have done.Pavan: Yeah.swyx: Incredible work guys. Yeah.Pavan: It now you just. Send a prompt and you get an image.swyx: Yeah. Surprisingly not enough. I think image model labs use those techniques in production. I think it's, I feel like it's a lot of research demos, but [00:13:00] nothing I can use on my phone today.Guillaume: The thing, there's a thing that would be interesting here is that since, indeed I've been so much sure that has been done in the vision community compared to radio dys, stomach, I think there are so many long infra Yeah.And there are so many things we can do to actually improve this further. So it's our first version, but we have so many ways to exist, much better and much more efficient, cost efficient, soswyx: yeah.Guillaume: So really it's not a new field at all, of course, but there are still so many things that can be done.Perfect. It'sswyx: nice. I should also mention for those who are newer to flow matching, I think the creator, this guy's name is Alex, he's done I think in Europe's maybe two Europes as ago. There was, there's a very good workshop. There's one hour on like this matching is I would recommend people look that up.That's the other thing, right?Efficiency and Model Strategyswyx: The efficiency wise, like I, I imagine like the reason is open weights the reason you pick 3.6 B backbone it you are 3.4 B you are, try to fit to some kinda hardware constraints. You kinda fits some kinda basic constraints. What are they?Guillaume: Not necessarily, I think something we care about in our model that they're efficient.So we have a [00:14:00] lot of separate model, for instance. So we have this that is very small, very efficient. We also have a small OCR model that is available. Good, highly efficient as well. And I think on a project maybe there, I think companies are going to take is to have a coverage general model that will do a bit of everything.But that is also going to be expensive. On here. What want say is if you care about this specific use case, if you can actually use this model, it just does that. It's extremely good at it. Survey, very efficient. That's why we can actually add. We do, but also OCR that are like really good at that.And that would be much more cost effective factors and the general model that will contain a lot of capabilities you don't really need. So yeah. So we're doing like general model, but also like more customized model. This,Open Weights and BenchmarksVibhu: how does it compare to other TTS models? It's, we are going follow open wave.We're just dropping it. I think it's pretty good.Pavan: Yeah, I think it's pretty good. Like it, it's definitely one of the best. For sure. It's probably I would say it's the best open source model, butVibhu: decipher themselves.swyx: Yeah.Voice Agents VisionVibhu: Why now? How does it fit into broader ral vision? How do you see voice agents?How do you see voice? I think every year I've heard, okay, you're a [00:15:00] voice. You're a voice. There's a lot of architectural stuff. There's a lot of end time that see it, your solving, but where do you see voice setting?Guillaume: We had so many customers asking for voice. That's also why we wanted to build it.What's interesting in this domain is that. In a sense, if you take something simple like transcription it doesn't seem like something that should be very hard to do for a model. It's essentially, it's pattern recognition. It's classification on this. Models are very good at classifying, right?Or nonetheless, when you talk to them it's not there yet, right? It's not, you don't talk to them the same way you talk to a person. On something, maybe people don't realize it. It's in English it's still much better than in any user language, even compared to French instance. If you talk to this million in French, when you see people talking to this they'll talk very slow.They'll articulate as much as they can. So it's not natural, right? We're not yet to this. And I think, yeah, maybe the next generation will not know this, but yeah, I think people that. But our edge will actually always keep this bias speaking very slowly when they talk to this model. Even if maybe, probably in a couple of years, maybe next year it'll not be necessary anymore.But yeah. But what's interesting is to see that yeah, even for like languages [00:16:00] like yeah, French and Spanish Germans that are not no, no resource on religion. You have a lot of audios there on still it's not as good. And I think a consequence. Because then for this, I suppose just is not as much energy, as much effort that has been put done in some other mod that for some vision or like coding.But but yeah, there's still a lot of progress to be done. I think it's just a question of doing the work and it's clear path I think to get there.Pavan: It's a little fascinating because I worked on Google Assistant I think while back at this point, but it's, I think it's, it like when you take a step back, it's fascinating.It's not that long ago. It was like four years ago or five years ago, and it's now it's completely audio in, audio out and the function calling and the whole thing happens completely end to end. And in a very natural,swyx: yeah,Pavan: natural way and still ways to go. Kim was telling, even despite all the previous, it's not like you're speaking to a person.When you talk to any of these agents, bots, or voice mode kind of situation, it's still like a gap. I think that's the great part and I feel like with even the existing [00:17:00] stack, we should be able to get to this very natural speech conversational abilities soon enough I guess.And we'll also hope. I get thatGuillaume: on this kind of the next step, right? Because when you talk to these agents, like usually people are just writing to them and sometimes they'll this very clear, for instance, you are, you want to write code, but you are, you have a very clear idea of how you want the model to implement what you in mind.But so here you are able to spend a lot of time writing. So it's not really efficient on audio is really like a natural interface that is just not there yet, but I think it's just gonna be the place.Vibhu: How's it like building, serving, inferencing, like we see a lot about, it's very easy to take LMS off the shelf, serve them.Fine tuning, deploying. I know you guys have a whole you have Ford, you have a whole stack of customizing, deploying. Is there a lag in getting that. Like distribution channel. Are you helping? There is. So like prompting, lms, you can have them be concise, verbose, all that.They're built on LM backbones, these models. How do you see all that?Enterprise Deployment and PrivacyGuillaume: Yeah, I think this is a lot of what we're doing with our own customers. Very [00:18:00] often they come to us, so it's for different reasons. I think one reason is sometimes they have this lot of privacy concerns.They have this data that it's very sensitive. They don't want data to leave. The companies, they wanted to stay. Inside the company. So we have them deploy model in-house. So either on a, either on premise or on private cloud. So they're not worried that it's given to a third party on the there some leakage.Sometimes they have this kind of many companies have this different, sensitivity of data they have like sometimes channel chat can send it to the cloud has to stay there. So then it creates some kind of heterogeneous workflows where it's annoying. You cannot send some data to the cloud.This one you can, so here, when we actually deploy the model for them, they don't have this consideration. They are like not worried that, this is going to leak. Everything is much easier. So we help them basically do this on the, so it's one of the very proposition. But but the other is very often, when customers use this off the shelf close model, but very sad is that they are not leveraging, these data that have been collecting for four years or something for decades.So much data. Sometimes it's trillions of tokens of [00:19:00] data in a very specific domain. Their domain, which is data that you'll not find in the public, on the public internet. So data on which, like close model, we actually not have access to one, which that's going to be really good. So if they're using like closed source models are basically not benefiting from all these insights.All these data they have collected three years, they can always give it into the context that in France, but is never as good as if you actually train the modern analysis. So yes, that's basically what we help them to do. We actually provide them some purchase, basically what we announced at GTC this week.So we provide them with this, it's basically like a platform with a lot of tools to actually help them process data. Trained on that. Yeah, it's actually the same thing that we're using in the science team. So it's actually very better tested infrastructure, like a lot of efficient training cut base.For a quality pre-training like a fine tuning, even doing S-F-T-I-L. So we help them do this using the same tools as what our science team is building is using. So since it's tools that we've been using for two years now, it's really better tested. It's really sophisticated.So it's the same thing. We are giving to them, giving the company the same thing [00:20:00] that what are same still using internally actually build their own ai and it makes a really big difference. I think sometimes customers. And many in general don't realize how much better the model becomes when you fine tune it on your own data.And you can have a, your model is here. You start from there. You have a cross source model, which is sort here, but if you actually fine tune it can actually really go much further than this. And then you have a very big advantage. The model is trained on your entire company knowledge, so it knows everything.You don't have to feed like 10 K tokens of contact at every query. So it's it's much easier. It's a bit, I think using a closed source model is really sad because it basically puts. You are not leveraging all this data and you are going to be using the same model as all your old competitors when you're actually using, everything you have been collected for years, which is really valuable.So yeah. So we help basically customers do this. We have a lot of solution I mean deployed for engineers that go in the company that basically look at the problem customers are facing to look at what they're struggling to do what we should do to solve it. So we help them solve them together.So it's I think our approach is a bit different, but here. [00:21:00] Some of their companies and competitors, it's, we don't just release an endpoint on sale, do some stuff on top of that, or we don't just give a checkpoint. We really look very closely with customers. We look at the issues they have, we had them solve them.We really make some tailored solution for the client are facing. Some example are also going to be, sometime we have some customers. They really wanted to have a really good model, really performance on some, like Asian languages on the, if you take some of the shelf models, they can speak it, they can write in this language, but it's not amazing.This language would be like maybe zero 1% of the mixture. So it has been included during training, but very little. So what we did here is upgrade. We trained a new model for them, but so this language was 50% of the mix, so it's much, much stronger. It knows of the dialects, it knows the, so it's yeah.So it's some example of things we can do and it's really arbitrary, custom. I think you had some of their customers, for instance, they wanted some. They wanted some 3D model that can do audio with a very good function cable. So something you wanted to put in the car in particular, they wanted this to be offline because in a car you don't necessarily have access to internet.So [00:22:00] yeah. So here we can actually build the solutions. There is no like model out of the box on this. In the internet you have this very, you have this very general model generalist, like he's strong model. But for things like this, they always want at specific solutions and on some other reasons.Sometimes they come to us is because, like they, they experiment with some closed source model. They get some prototype. They're happy with what they build. They, it works well. They're happy with the performance, and then they want to go to production and then they analyze. But it's extremely expensive.You cannot push this. It's so then they come back to us on this. They can help us build the same thing as this, but using something much cheaper on here. And here we can sometime be something 10 x cheaper by just functioning a model and it'll be better OnPrem on their old server and also much cheaper as well.So yeah,swyx: that's the drop pitch right there. Take all themoney.Vibhu: And outside of that you do, we do put open wave models so people can do this themselves. I feel like not enough people go outta their way.swyx: They're not going to, they're gonna ask them to do it as the expert. IGuillaume: think initially we didn't know, [00:23:00] we wanted completely short at the beginning of the company because, I think our study was not exactly the same as what it is today, but what we underestimated initially is the complexity of deploying this model and connecting them to everything to be sure it has access to the company knowledge on the, and it was, yeah, on, we were seeing customers struggling with this, but it was even, that was three years ago and no, things are much more complicated because now you don't just have, text on SFT on a simple instruction following.You have reasoning like your agents, you have like tools. You have a multimodal audio, so it's much more complicated than before. And even back then it was hard for customers. So they really need, have some support and this is why actually providing like always some four D position as well. The processFine Tuning and Personalizationswyx: I'm curious is there also voice fine tuning that people do?Pavan: So in this forge we also have a say unified framework. And the hope is like the er speech to text that we released earlier this year. And even the ER chart that we released last year. And I think a big people, I think there's a big, rich ecosystem [00:24:00] of people fine tuning whisper, and people want the same thing with w so it's much stronger than Whisper.And yeah, the the platform offers that kind of fine tuning yeah, which could be any kind of fine tuning. Like for instance, even sometimes people want to support new languages to this, which are tail languages, which we hope to cover. Certain natively, but if there is a language where you data and you want to frank you, I think this is a good use case.Or the other use cases, you, it's the same language, like even English but it's in a very domain specific way.swyx: Yeah. Terminology, jargon, medical stuff.Pavan: Exactly. And also there's specific acoustic conditions like there's a lot of noise or the, and. The model will do decently in most conditions, but you can always make it better.And that those are some of the use cases where you can improve it e even further. And that's one good use case for this and for text to speech. We're just releasing it so we'll have support for that soon too. I think it's similar use case.Voice Personalization Pavan: It's little different the kind of things that you want to extend a [00:25:00] text to speech model to, which could be like voice personalization, voice adaptation for enterprises.Many enterprises need very specific kind of tone, very specific kind of like personality for this kind of voice. And all of those are like good use cases for fine tuning.swyx: This one I was gonna ask you, we never talked about cloning voice clothing here. How important is it, right?Like I can clone a famous person's voice. Okay. ButPavan: the main use case would be like for enterprise personalization, like enterprises need like a lot of customization. You don't want the same. Voice for all the enterprises. Each enterprise want a customized, specialized something which is representative both their brand and also their, I guess safety considerations and the use case I think the kind of thing that you would deploy as a empathetic assistant in the context of a healthcare domain would be very different from the kind of thing that would be in a customer support bot and would be different from like more conversational aspects.I think those are the. [00:26:00] Customizations you would expect from enterprise. And that's the main use case, at least from our side.Vibhu: My, my basic example is you don't want to call to customer services and have the same exact voice. It's just, it's gonna be weird.Long-Form Speech ModelsLong-Form Speech ModelsVibhu: But also on the technical side of this, so there's like a few things in TRO that I thought were pretty interesting.He's a big fan of this paper. Oh, he said very good paper. He said this is the best SR paper he's ever read. Yeah. I've hyped up this voice paper enough. We covered it. Somewhere, but a big thing. So Whisper is known for 32nd generation a 32nd processing. You extended this to 40 minutes. There was a lot of good detail in the paper about how this was done.Even little niches of how the padding is. So it's very much needed. You need to have that padding in there, the synthetic data generation around this. I'm wondering if you can share the same about the new speech to text, right? Text to speech. So how do you. How do you generate long form, coherent?How do you generate, how do you do that? And then any gems? Is there gonna be a paper?Pavan: Yeah. Yeah. They would be a technical report. Okay. Yeah. I think I could have a lot of details.Real-Time Encoder AdvancesPavan: But me I think the [00:27:00] summary of it, actually, some of the considerations in this paper were, because we started with the wipa encoder as the starting point, and now we have in-house encoders, like the bigger time model, for instance, which we released in January.Also release a technical report for that real time model as well, which is this dual stream architecture. It's an interesting architecture. You should check it out. And there we have a causal encoder and I don't think there's any strong, multilingual causal encoder out in the community. So we thought it's a good contribution.So that's one nice encoder there. Other people want to adapt. That's a good end code. And we train it from scratch. I think her. Post stack is now mature enough that we are able to train super strong ENC codes. And some of these considerations, like spatting and stuff, is a function of the Whisper ENC code.And now that we train encoders, inhouse the design concentrations are different.Scaling Context for TTSPavan: And for the question on text to speech, I think that's also leans onto the original auto aggressive decoder backbone. I think, it says very, almost identical considerations. I think the long context in it's not even long con, [00:28:00] so the model processes audio at 12.5 herds, so one second maps to like 12.5 tokens.So I think one minute is like 7.8 tokens. You can get like up to 10 minutes in eight K context window and get half an hour and 30 K context window. So that's and 30 2K context is something that's we are very comfortable training on. We can extend it even much longer. 1 48 K. Okay. You can naturally see how it can extend to even our long generations.Yeah. We need the. Like data recipe and the whole algorithm to work coherently enough through such long context. But the techniques are some way very similar to the text, long context modeling. And the key differences, it's just doing flow matching order regressively instead of a text open prediction.swyx: Okay. I think that was most, most of the sort of voice questions that we had. ButWhat Makes a Model SmallVibhu: I have a big question on Mr. Al, Mr. Small. So what is small? How do we define [00:29:00] small? What is this? What is this? I remember the days of Misal seven B on my laptop. The snuff fitting on my laptop. I could run it on the big laptop, butGuillaume: it's just additional.Question of terminology, like here what we did, baseball is north active parameters, but it's true. Really not give it another name, but yeah, we could have called it medium, but only, I,I suppose it's a model that we released mixture of experts. It's a model that combines different model before which we were doing the same, is that we had one model, general model for Israel. Doing instruction following, were like a separate model that was Devrel trial. So qu coding specify specific to code with another model for Reason Maal.So this were separate artifacts built by different team at trial on what we're doing is basically merging all of this. It was, you had pixel trial was the first vision model. We was like a separate model on the way we do things internally is that we have one team focus on one capability, build one model.On the means mature, mature enough, we decide to merge this into the [00:30:00] matrix. But here it was the first time we basically match all of this into one. But there are some other things we did at first time to merge time, for instance, like more capabilities or function coding I think would be, are, it's going to be much, much better in this trial, small platform.But but yeah, so it's our latest model on the working is,Vibhu: and yeah, key things is it's very sparse. Six, be active pretty efficient to serve. 2 56 K context. Yeah,Merging Capabilities vs Specialistsswyx: I think what's interesting is just this general theory of developing individual capabilities in different teams and then merging them.Where is this going gonna end up?Vibhu: Like we've seen the five things put together in this. Yeah. What are the next five teams?swyx: I think actually OpenAI has gone away from the original four Oh. Vision of the Omni model. This was what they were selling. All modalities and all modalities out.But I feel like you might do it.Guillaume: I think there's some mod where it's not competitive use, for instance for audio. For audio here, if you want to do transcription, I think it makes no sense to use a model. If you just want to trans tech it, it'll be very inefficient. If you want to do audio, you probably just want to be the [00:31:00] one VR 3D model performance essentiallyswyx: the same.It's going to be incredibly cheaper. So here, that's why we wantGuillaume: to have a separate but just does this. Yeah, I think the question is just, yeah. If you are to, to your model. By speech and you asking like a very complex questions on how you do this on the, just to cascade things. Do you want to put a d in a model that has like a one key around it?It's like a, not a competitive discussion, I think unaware if you doing into the direction, but that's possible. Of course. But yeah. But I think for us, the next capabilities we want to try to integrate into these models when we are going to be yes, like marketing or no reasoning better, I think more capabilities that people don't talk too much about, but at high bottom, I think for our customers in our, on different industries, for instance, things are around like a legal computer.I design all these things that is this males out of the box are to put at that. Because people, if you don't prioritize this, there is not like too benchmark on that. Butswyx: this done how toGuillaume: make this good and this just start to do the work. Extracting some that processing it [00:32:00] expression. So yeah.But we are offering the imagine to this.swyx: I think for voice. Yeah. The key thing I think over maybe like the last year or so with VO and gr Imagine and all these things is joining voice with video, right? Which people don't understand spatial audio because like most TTS is just oh, I'm speaking to a microphone in perfect studio quality.But when you have video, like the voice moves around.Pavan: That's true. The constitution was a little different in the sense that there it's like a a standalone artifact where you get the whole thing and you consume it. But in a conversational setting, it's a, you need the extreme low latency.swyx: Yeah,Pavan: streaming would be one of the primary concentrations.swyx: You can build a giant company just doing that, right? So you don't need to do the voice, but I was just know on the theme of merging modalities, that is something I, I am like, wow. Like I didn't, everyone up till, let's say mid last year was just doing these like pipelines of okay, we'll stitch a TTS model with a voice thing and a lip sync [00:33:00] thing and what have you.Nope. Just giant model. Yeah.Open Source MissionVibhu: I have a two part question. So one is, it's still open. It seems like open source is still very core to what you guys do and I just have to plug your paper. Jan 2024. This is the one trial of experts like. Very fundamental research on how to do good.Moes paper comes out very good paper for anyone. That's just side tangent. No.swyx: This thing caused, we bring back, eight by 22 was like the nuclear bomb for open source. I think it takes Shouldn be more seven B more. Yeah. Yeah. But this is a bigger opposite than me.Yeah. Yeah I don't remember this. I remember, I don't think it was January, right? It was like new reps it was, it dropped during new reps and everyone in Europes was December of 25th, I think. Yeah. The model was did as well.Vibhu: It's just a little update probably.swyx: Yeah. No, but you have a point to make.Vibhu: No, you gotta check that. But then, I just want to hear more broadly on open source for you guys, and when you had asked earlier [00:34:00] about what's next, what are the other, side tapes working on you. You put out Lean straw. This,swyx: it's not necessarily surprise. I was like, I don't, this doesn't fit my mental model or Misra.Guillaume: Yeah. First for open source in general, I think it's really something which looks to the January of the company. I think we started it per once, is we so we have open sourcing with, since the beginning and even before this. So before this, so me and Tim were at Meta, we released LA and I think what was really nice.To see that before this, for most researchers like universities, it was impossible to work on elements. There was no alien outside. And if you look at many of the techniques that were developed after, for instance, was open source all this post-training approaches like even DPOD, like preference optimization, all of this were done by people that had access to this portal.And it'll have been impossible to do without this. So it's really making sense, move faster. So we really want to contribute to this ecosystem. I think like the deep and also like very lot of impact. All these papers that are I think in the open source community are really helping the science community as a whole to move faster.So [00:35:00] we want contribute to this ecosystem. That's why we're releasing very detailed technical reports. So ma trial and our first reason model, and ation, lot of results, things that work, things that did not work as well. Think helpful on the, yeah, so for the audio model also to share a lot of details, share of them for real time model.And the, yeah, so we really want to continue this, basically belong to this community of people who share science. I think we really don't want to be, leading in a world where the smartest model, the best models are only behind, close doors. Only accessible to a shoe companies that we, as a power to decide we can use them on it.I think it's a scary future. We don't want to live in, we really want this model to be accessible to anyone that want. Intelligence to be used unaccessible by anyone who can use it. So yeah, so that's why we are pushing this mission and source model. Yeah. So not, so yeah, no strategy. So it's open source, not the first model, so not the best on the Yeah.Lean and Formal ProofsGuillaume: LIN trial I think is also one step into this direction. So it's yeah, a bit different than what we are usually releasing. But we have a small team internally [00:36:00] working on them. Formal proofing, formal math. So I think a subject we care about in general and we were working on reasoning. I think we started too early before doing reasoning without LMD is very hard, especially when you work with formal systems because the amount of data you have is negligible.It's addressable community of people writing like formal proofs. But the reason why we like it is because I think there is if you look at what people are doing with reasoning, is there, the problems that you can use. Are usually going to be problems where you can verify the output. So for instance, all this ai ME problem where the solution is a number between 100, like a thousand.So you can verify, compare this with a reference or it's an expression. You can actually compare the output expression generic with the reference. But there are many, most of them have problem and most of the reason problem. There is no like way to easily verify the solution. If the question is show that F is continuous, cannot compare in the reference, right?If it's a probe that this is true or probes is properties, there is no way to. You cannot act, simply verify the correctness of your proof. So it's hard to apply the, there is no referable reward here. So [00:37:00] what you could provide is of course, like a judge and judge that will look at your proof. But it's very hard and it's very, you could do certain, some reward hacking happening there.So it's difficult. You could provide like a reference proof, but then there are also many ways to prove the same thing. So if the model says give negative reward because it's a different poop, maybe it was still digit proof, just different. So it's not going to work well. What's nice with lean and with formal probing is that you don't have to worry about this whatsoever.We just,swyx: they're all function is largely compiles in lean is functionally the same. Exactly.Guillaume: It's like a problem if it compiles it's correct. It's very easy. And you can apply this and then you can,swyx: it's just way too small. So no human will actually go and do it.Guillaume: Yeah, that's exactly.It's the only people can do it. It's like a very small committee of people doing a PhD on that. So it's super small. And it's sad because it's actually very useful on not just mat, but also in software verification. So for instance, software verification today. So tiny market. Very few industries work on this and we need that.It's usually going to be like companies like building airplanes, air robotics,swyx: likeGuillaume: things [00:38:00] where they absolutely want to be sure. Life depend on this, but it's very rare that people formally verify the correctness of their software. But I think one of the reasons for this is simply that it's just hard to do.swyx: Are you think of TLA plus? It's the language that some people do for software verification? No. That people use in a ference, but but yeah, it's the reason I think why people don't use it more and why this industry is not as big as could be is because it's very hard. But now with cutting edges that are there, it's going to be very different.Guillaume: We're going to see much more of this. So I think yes, industry there is going to be much larger in the future that we, these models. So yeah. Here also anticipating this a little bit, we wanted to work on that because it's proving like a math theory and like a, essentially the same tools.swyx: Yeah.Reasoning Transfer and Agentsswyx: One of my theories is that because the proofs takes so long, it's actually just a proxy for long horizon reasoning and coherence and planning. Maybe a lot of people will say okay, it's for people who like math. It's for being okay. It's like a niche math language. Who cares? But actually, and you use this as part of your data mixture for [00:39:00] post-training and reasoning, actually, it might spike everywhere else.Yeah. And I think that's un under explored or no one's like really put out a definitive paper on how this generalizes.Guillaume: Yeah, absolutely. AndPavan: I think evenGuillaume: that's what we're seeing already. For instance, you should do some reasoning on math as then the American should do reason even.Yeah. In the early stage. So we, the, there is some transfer, some sort of emergence that happens. And I think some, it's also interesting, it's not just I think the topic in general, but it's, there is a lot of connection with this on including agents because. Sometimes the model can see like a three that it has to prove it's very complex, but then it can take the initiative to say, I'm going to prove this three lr.I'm going to suggest three Rs, and I'm going to in parallel prove each R. So three of them in parallel with sub agents, but I'm also going to prove them in theory and the three tool so you can do this also. Pretty interesting. You can, even if you fail to put one of the LeMar, you can actually, maybe you succeed to put the normal lema too, so you get some possible reward here.So it's a bit less Spartan issue, just get to zero one for the entire thing. [00:40:00] So it's pretty interesting. I think we can actually,Vibhu: yeah, it's also an interesting case just for specialized models in general, right? Like the cost thing you show is pretty interesting yeah, similar score wise, you are, thirty, seventy, a hundred fifty, three hundred bucks.Smaller.swyx: I think cost is a bit unfair, right? ‘cause this one is at like inference cost. It's always there on top with their margins on top of it. But, we don't know anything else, so we gotta figure it out.Vibhu: Okay.Next Frontiers in TrainingVibhu: I did wanna actually push on that more. Not on cost, but you mentioned about, okay, it's a great way to have verifiable long context reasoning.What are other frontiers that, I'm sure you guys are working on internally, there's a lot of push of people pushing back on pre-training. Scaling, RL pushing, compute towards having more than half of your training budget. All on rl. Where are you guys seeing the frontier of research in that?Guillaume: You mean theVibhu: just in foundation model training in the next, one thing that you guys do actually is you do fundamental research from the ground up, right? So you probably have a really good look at where you can [00:41:00] forecast this out.Guillaume: Yeah. I think for us we're still working a lot on the pre-training side.I think we are very far from situational, the pre-training. I think ML four preprinting will be like big step compared to everything we have done before. So we are pretty excited about this. And I think on the other side, I think now we have more and more to think about this algorithm that will actually support this very long trajectories.I think when it was, for instance, GRPO for it doesn't really work this any bit of policy. Which was okay initially because you are solving math problem that can be solved in like a few thousand tokens. So the model can alize them pretty quickly. So when you do your update, the model is never too far off.It's never too far off. But now when you are moving towards this kind of problems where certain takes hours, like six hours to get a reward, then your model is co pick places. So you have bi new infrastructure that supports this, but also new A, so now everything we're doing internally, we're trying to. Build some infra that we actually anticipate is what we have in six months, one now, which is this extremely no scenarios on the, I think when we started Missal, part of me and [00:42:00] we wanted to, is very nice under element where people are there, they can do research, they like with a lot of resources.So it was nice. I think things changed a lot when I think when J Pity came out. I think after that I think was. This one is same again. But but yeah, but it was nice. And I think we also want to work part of this descrip beforeswyx: coming to the end.Hiring and Team Footprintswyx: We're just, obviously, I think you guys are doing incredible work.You've, they are a very impressive vision for open source and for voice. What are you hiring for? What's the what are you looking for that you are trying to join the company?Guillaume: Yeah, so we are hiring a lot of people in our sense team. We're hiring, in all our offices. So we have a, our H two is in France in Paris.We have a small team in London. We like a team in Pato as well. Co we open some offices in in SAU, in Poland. So one in Zurich. We also like some presence in New York as well on Sooner one in San Francisco. So we all bit either way also like hiring remotely. So we're going the team trying to hire like very strong people.I think we want to stay, so the team is not. Instead of fairly small team. [00:43:00] But I think we want to keep it that way. ‘Cause we we find it quite efficient. So like a small team they agile so yeah.swyx: Okay.AI for Science Partnershipsswyx: Let's focus on science and the forward deployed. We actually are strong believers in science.We started the our new science pod that focuses specifically on the air for science. What areas do you think are the most promis.Guillaume: What we're pretty excited about right now, and something we have already started doing or that we'd probably be able to share more about this in a couple of months, is that we are exploring AI for science.And there are a lot of areas where we think that you could get some extremely promising buzz. If you were to apply AI in these domains. There are a lot of long inputs. You just have to find these domains where actually AI has not been yet applied, and it's usually hard to do because the people working in those domains don't necessarily know the capability of these models.They don't know. How I would just have to pair them with Yeah, exactly. Your researcher slashing, which is actually hard to do. But this matching, we're doing it naturally with our customers. So we have some company we are very closely with. So for instance, ISM Andreesen are one of our partners, so we're doing some research with them on their other, like tons of extremely interesting problems.Columns in physics, in [00:44:00] science matter science that they're essentially the only ones to work on. ‘cause they're doing something No, no one else is doing on the, yeah. So there are many domains where AI can actually revolutionize things. Just you have to think about it on you familiar with what can do or to apply it.So yeah, it's something where more modeling with our partners, with our customers sort AI for s, but.swyx: Yeah. Okay.Forward Deployed Skillsswyx: And then for deployed what it makes a good four deployed engineer, what do they need? Where do people fail?Guillaume: I think it's usually you need people that are very familiar with the tech and not necessarily with a lot of research expertise, but that are actually pretty good at using this model that can actually like that know how to do functioning, that know how to like, start some error pipeline.And it's it's not easy. It's something that mucus. Majority of companies will not be able to do this on their own. So here I think we need people that are, that like to solve problems that are accept solving some complex, very concrete problem. It's applied science basically.And yeah, so I think it's not too different. I think from the case you need in research because it's essentially you are trying to find solutions to problems that in [00:45:00] customers have not yet. So sometimes it's easy. Sometimes you're here to do the work. You have to like create synthetic data.Find some edge case. So it can be, yeah. Depends on the problem. But but yeah, you have to, I think it also a bit of patience on the be creative. I think very similar skill is Asian,Pavan: the diversity of the work they do. It always surprises me. It's it's, it goes all the way from the kind of stuff they encounter in industries.It's just very interesting. I think.swyx: Any fun like success anecdotes.Guillaume: Yeah, it can be actually training this small model on edge that just we do one specific thing can be like training some very large model without some specific languages as well. Making models really good at some tube use, like for instance, computer ID design, these kind of things.Is that pairing with vision as well? Yeah,Pavan: and the fact detection for chips or like in, in factories identifying things like it, the. Diversity could be anything where you can deploy these foundation models. So yeah the work to make it work in that specific setting, basically whatever it takes to make it like add value in that, by the way, workflow.Vibhu: Yeah. [00:46:00] And it goes across the stack, right? Like even just pulling up the website like.swyx: It's so broad on compute. It is so broad.Vibhu: We didn't even touch on if you have a coding CLI tool. One thing you guys were actually like, I think the first tool was agents, ral agents. You had the agent builder, you can serve it via API and all that.And I'm guessing forward deploy people.Guillaume: Yeah.Vibhu: Help build that out and stuff.Customer Feedback LoopGuillaume: It is also why we are, so we're doing many things, but I think that's also part of the value proposition that sometime know customers. They're always very. Extremely careful about their data and they don't want to, they don't like, trusting so many partners, trusting one partner for code, giving the data to another third party for like audios and another one.So they don't like this here. What they really like with our approach that we can help them on anything so they don't have to send the data to so many clouds. So yeah,swyx: I think that there can be many orders of magnitude more. F Ds then research scientists and they don't need your full experience, but they're still super variable to customersGuillaume: in practice.These two teams [00:47:00] are still quite intertwine, very often. Yeah. So first of all, they're using the same tools, the same data pipeline and everything on the, it's it's very helpful for the science team to get the feedback and the solution team ‘cause they can. Look at these customers are trying to do this.This is not working. It can really be show in the next version. Yeah. But this is basically a real world eval. Yeah, it's real world eval and it's not something, for instance, if you're just working in the lab, it's just ships model. But you don't do this work of for customers. You have no idea for whether your model is good at this H case.For instance, you even in year found this, right? So yeah, there is a very gap, big gap between the public benchmarks that are very like academic. OnPavan: the rare cases are just very diverse and in the specific concept of a customer, you can fine tune and make it like first evaluate, create a solid eval, benchmark, and then measure in the context of their, the kind of audio.Like for instance, one use case is literally just, there's the word for kids and they have to just say it out. It's a very specific thing. You're just saying one word and then you have to you, you'll grade the kid whether they did it right or not. It's [00:48:00] like R for, but so there're very diverse use cases and the idea is that they, the.Applied scientist engineer will go and make it better. And then from the learnings we incorporate it into the base model itself. So it's it's just better out of the box.Vibhu: Yeah. It's a good full circle system. Like the foundation model evals are all just proxies of what you really, you're never gonna have one that says it, it doesn't make sense for there to be, a one word transcription like that.It's not something you wanna fit on. Perfect.Wrap Up and Thanksswyx: Everyone should go check out everything that Michelle has to offer and try the TTS model, which will link in the show notes. But thank you so much for coming tha thanks. Such a stretch. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Amazon Creative Studio is a free AI-powered tool built directly into your Amazon Ads console that transforms basic product photos into professional lifestyle images, videos, and animations—without hiring an agency or learning design software.In this complete tutorial, you'll learn how to access Creative Studio, use its three core features (Image Generator, Video Generator, and Creative Agent chat), and master the prompting techniques that separate mediocre outputs from professional results. The tool is currently available across 10 markets including the US, UK, Canada, Germany, France, Italy, Spain, Australia, Brazil, and Mexico.Testing shows AI-generated creative can drive up to 40% higher click-through rates compared to basic product photos. You can generate finished assets in 5-10 minutes—work that typically takes agencies days or costs $3,000-$10,000 per project.We cover what's working (speed, cost, integration), what's not (brand consistency, occasional policy rejections), and exactly who should use this tool versus those who should skip it. Whether you're running ads with boring white-background images or testing video for the first time, this tool eliminates the biggest barrier: professional creative quality.Key Takeaways• Three core features work best: Start with Image Generation, progress to Video, then experiment with Creative Agent chat for full campaign concepts• Prompting matters significantly: Include context (who it's for), specifics (concrete details), and mood (emotional tone) for output quality that rivals professional designers• Best use case: Testing new products and campaigns with AI, then investing in professional production only for proven winners#AmazonCreativeStudio #AmazonAds #AICreativeStudio #AmazonAdvertising #EcommerceAI
WBD Gives Paramount 7-Day Deadline for Final Acquisition Offer Despite Favoring Netflix Deal, Apple Begins Internal E2EE RCS Testing in iOS 26.4 Beta, and Snapchat Creator Subscription Alpha Launches Feb 23. MP3 Please SUBSCRIBE HERE for free or get DTNS Live ad-free. A special thanks to all our supporters–without you, none of this would beContinue reading "X Under Investigation by DPC Over Grok’s Alleged Nonconsensual Image Generation – DTH"
Discover the machine learning breakthrough changing everything: rectified flow. Frank breaks down how this revolutionary technique replaces 1,000-step diffusion with just 10, delivering faster and higher-quality image and video generation. Learn why major players are adopting it and how it's democratizing AI for everyday creators. Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us ⭐⭐ Machine transcription available on http://mergeconflict.fm
Hundreds of millions just got an AI glow up and didn't even notice.
The Agents of Change: SEO, Social Media, and Mobile Marketing for Small Business
Description: Remember when stock photography meant scrolling endlessly through the same crying-into-salad images that every other brand was also using? Or spending actual money on "custom" photos that somehow ended up on your competitor's website too? Yeah, those days might finally be over. Lauren deVane—creative director turned AI educator and your friendly neighborhood "AI auntie"—joins the show to explain how AI image generation has evolved from six-fingered curiosities into legitimate marketing tools. And if you haven't checked out what these tools can do in the past six months, you're in for a surprise. Lauren breaks down exactly how small business marketers can create original, on-brand imagery without the stock photo budget or the production headaches. https://www.theagentsofchange.com/613 Need help with your branding, website, or digital marketing? Reach out to me (Rich Brooks!) today at https://www.takeflyte.com/contact
Fedor presents a simple guide to "AI" and how the term has evolved in gaming, discussing where and why Roblox is investing in many different types of AI models: from basic machine learning to LLMs; safety nets to generative AI tools.Join the GEEIQ Integration Network for free today (Ad):https://geeiq.com/developers/?afmc=lastlevelChapters:(00:00) Intro(04:59) Safety & Moderation(14:50) The Algorithm(24:39) GEEIQ Integration Network for Roblox Creators (ad)(25:28) Generative AI & Code Assist(31:09) Material & Image Generation(38:53) Cube AI(44:54) Real-Time Dreaming(50:32) Will AI Replace Creators?(58:45) OutroEpisode 13Sources:- Roblox Creator Roadmap: https://create.roblox.com/roadmap- Roblox AI-powered moderation: https://corp.roblox.com/newsroom/2025/07/roblox-ai-moderation-massive-scale- Roblox Sentinel, open-sourced risk detection model: https://corp.roblox.com/newsroom/2025/08/open-sourcing-roblox-sentinel-preemptive-risk-detection- Roblox Cube, 3D and 4D generative model: https://corp.roblox.com/newsroom/2025/03/introducing-roblox-cube- Roblox's real-time dreaming: https://x.com/Roblox/status/2010766064708169738/video/1Hosts:- Adam (BanTech): https://lastlevel.co.uk/adam- Fedor (LoadingL0n3ly): https://x.com/LoadingL0n3ly----------------------------Watch or listen wherever you get your podcasts.Visit https://lastlevel.co.uk/podcast for more.Join the Discord: https://discord.lastlevel.co.ukBeyond The Blox is produced by Seb Jensen for Last Level Studios.
Spotify Raises Premium Prices for Third Time Since 2023, Google’s “Glic” Brings Agentic Gemini AI to Chrome for Android, and Cerebras Secures $10 Billion Deal with OpenAI for 750MW of Computing Power. MP3 Please SUBSCRIBE HERE for free or get DTNS Live ad-free. A special thanks to all our supporters–without you, none of this wouldContinue reading "X Bans Revealing Image Edits, Paywalls Grok AI Image Generation – DTH"
Plus, Meta signs deals with three nuclear companies for 6-plus GW of power. Learn more about your ad choices. Visit podcastchoices.com/adchoices
AP's Lisa Dwyer reports on the growing focus on explicit AI generated images on Grok.
For the latest our work and technology correspondent, Brian O'Donovan.
In a blog post published Friday morning, Ben Horowitz writes that "as the American leader in Venture Capital, the fate of new technology in the United States rests partly on our shoulders." It's the kind of statement certain to cause agita at rival firms. Also, Elon Musk's AI company has restricted Grok's controversial AI image-generation feature to only paying subscribers on X, after the tool invited heated criticism from across the world for letting users generate sexualized images of women and children. Learn more about your ad choices. Visit podcastchoices.com/adchoices
פרק מספר 508 של רברס עם פלטפורמה, שהוקלט ב-30 בדצמבר 2025, קצת לפני שנגמרה השנה וקצת אחרי שבאמת התחיל סוג של חורף - אורי ורן לוגמים תה ומארחים את מישה פיינשטיין מחברת Bria AI כדי לדבר על איך עושים תמונות בצורה שבאמת התכוונתם (וגם קצת על חורף).
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
In this episode, we unpack OpenAI's continued “code red” push with its latest image generation model and what sets it apart. In this episode, we explore how the model works, why OpenAI is accelerating its efforts, and what it could mean for creators and AI art.Get the top 40+ AI Models for $20 at AI Box: https://aibox.aiAI Chat YouTube Channel: https://www.youtube.com/@JaedenSchaferJoin my AI Hustle Community: https://www.skool.com/aihustle-See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Plus, there's a new dedicated sidebar with presets and prompt suggestions. Learn more about your ad choices. Visit podcastchoices.com/adchoices
While most AI systems struggle to properly represent diverse skin tones and features, these visionary founders have built a solution that puts Black representation at the forefront. Walter Gainer II @Itsthegreatwalt speaks with with Karen Okonkwo (Tonl) and Abron Maldonado (Create Labs) to learn how how ai is empowering our community and how their collaboration is solving the Bias problem in Image Generation.Discover how their groundbreaking partnership is creating AI that actually sees us, respects our image, and provides opportunities rather than threats to Black creators.This isn't just about better AI images—it's about who controls our digital representation and whether we'll be participants or just subjects in the AI revolution.CONNECT WITH OUR GUESTS:Karen Okonkwo: https://www.instagram.com/karenokonkwo/Tonl: https://tonl.co/Abron Maldonado: https://www.instagram.com/abron_ai/Create Labs: https://createlabs.io/FOLLOW to the Working While Black Show for more conversations about career growth, entrepreneurship, technology, and more.CHAPTERS:(03:45) - AI's Racial Bias(09:08) - Collaboration Decision(13:25) - The Pushback On DEI (16:25) - Does AI Hurt Black Creatives and Photographers?(20:25) - Future of AI Predictions
In this episode, Seb and Preston explore Tesla's FSD 14.2 advancements and their implications for AI-driven autonomy. They also tackle the ethical, societal, and infrastructural challenges of rapid AI development—from brain-inspired computing to nuclear energy's role in supporting AGI. IN THIS EPISODE YOU'LL LEARN: 00:00:00 - Intro 00:01:44 - How Tesla's FSD 14.2 dramatically improved its autonomous driving performance 00:13:42 - The ethical dilemmas and liability concerns around AI decision-making 00:20:27 - Tesla's sensor-only approach versus LiDAR-heavy systems like Waymo 00:27:31- The potential of biologically-inspired artificial neurons 00:30:32 - How brain-computer interfaces could revolutionize AI and prosthetics 00:32:28 - The societal risks of tech-enhanced human capabilities 00:36:26 - How AI image generation tools like Google's Nano Banana Pro are evolving 00:49:37 - Why AI's energy demands are influencing nuclear power policy 01:00:06 - The risks of AI-induced content homogenization and “AI slop” 01:07:22 - Why some are turning to manual trades to escape AI disruption Disclaimer: Slight discrepancies in the timestamps may occur due to podcast platform differences. BOOKS AND RESOURCES Related book: Lifespan: Why We Age―and Why We Don't Have To. Seb's website: Seb Bunney - The Qi of Self Sovereignty. Seb's book: The Hidden Cost of Money. X Account: Seb Bunney. Related books mentioned in the podcast. Ad-free episodes on our Premium Feed. NEW TO THE SHOW? Join the exclusive TIP Mastermind Community to engage in meaningful stock investing discussions with Stig, Clay, Kyle, and the other community members. Follow our official social media accounts: X (Twitter) | LinkedIn | | Instagram | Facebook | TikTok. Check out our Bitcoin Fundamentals Starter Packs. Browse through all our episodes (complete with transcripts) here. Try our tool for picking stock winners and managing our portfolios: TIP Finance Tool. Enjoy exclusive perks from our favorite Apps and Services. Get smarter about valuing businesses in just a few minutes each week through our newsletter, The Intrinsic Value Newsletter. Learn how to better start, manage, and grow your business with the best business podcasts. SPONSORS Support our free podcast by supporting our sponsors: Simple Mining Human Rights Foundation Unchained HardBlock Linkedin Talent Solutions Kubera Vanta reMarkable Onramp Public.com Netsuite Shopify Abundant Mines Horizon Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://theinvestorspodcastnetwork.supportingcast.fm
Co-hosts Mark Thompson and Steve Little examine Google's groundbreaking Gemini 3 release, which delivers state-of-the-art multimodal reasoning and sets a new benchmark for AI capabilities. They also explore ChatGPT's upgrade to version 5.1 with improved instruction following and better handling of longer conversation.The hosts discuss Canva's new Creative Operating System, which now generates AI-powered designs directly within the platform.This week's Tip of the Week demonstrates how Gemini 3 can use the context you provide it to greatly improve the accuracy of your hand written transcription.In RapidFire, they cover NotebookLM's new deep research mode, Nano Banana's integration into Photoshop, Anthropic's privacy policy changes regarding training data, and how Claude's new usage monitoring feature can reduce your stress level.Timestamps:In the News:04:17 Google Gemini 3's Multimodal AI Reaches New Heights13:47 ChatGPT 5.1 upgrade is now better at following your Instructions22:39 Canva Creative Operating System: AI-Powered Design GenerationTip of the Week:26:21 Adding Reasoning to Your Transcriptions Improves AccuracyRapidFire:36:40 NotebookLM Becomes a Fully Featured Research Tool43:40 Nano Banana is Now Available in Photoshop47:20 Anthropic Announces Claude Chats Will Be Used for Training Data54:13 View Your Claude Usage in SettingsResource Links:Intro to Family History AI by the Family History AI Show Academyhttps://tixoom.app/fhaishowGoogle Gemini 3Introducing Gemini 3: A New Era of Intelligencehttps://blog.google/products/gemini/gemini-3/ChatGPT 5.1A smarter, more conversationalhttps://openai.com/index/gpt-5-1/GPT-5.1 New Features Explainedhttps://scalevise.com/resources/gpt-5-1-new-features/Canva Creative Operating SystemIntroducing Canva's Creative Operating Systemhttps://www.canva.com/newsroom/news/creative-operating-system/NotebookLM Deep ResearchNotebookLM adds Deep Research and support for more source typeshttps://blog.google/technology/google-labs/notebooklm-deep-research-file-types/Nano Banana in PhotoshopCreate with unlimited generations using Google Gemini 3 (Nano Banana Pro) in Adobe Fireflyhttps://blog.adobe.com/en/publish/2025/11/20/google-gemini-3-nano-banana-pro-firefly-photoshopAnthropic Privacy Policy UpdateAnthropic Will Use Claude Chats for Training Data: How to Opt Outhttps://www.wired.com/story/anthropic-using-claude-chats-for-training-how-to-opt-out/Updates to Consumer Terms and Privacy Policyhttps://www.anthropic.com/news/updates-to-our-consumer-termsClaude Usage MonitoringUsage Limit Best Practiceshttps://support.claude.com/en/articles/9797557-usage-limit-best-practicesTags:Artificial Intelligence, Genealogy, Family History, Google Gemini, ChatGPT, Canva, NotebookLM, Nano Banana, Anthropic Claude, Photo Restoration
- In a support document spotted by 9to5Google, Google notes free users can currently generate two images daily, down from three per day previously. The company wrote: "Image generation and editing is in high demand. Limits may change frequently and will reset daily." -The agency's director, John Squires, said in a notice obtained by Reuters that the USPTO deems genAI to be "analogous" to other tools that inventors might use in their process, including lab equipment, software and research databases. Squires wrote: "AI systems, including generative AI and other computational models, are instruments used by human inventors. They may provide services and generate ideas, but they remain tools used by the human inventor who conceived the claimed invention." -Alibaba's Quark AI glasses are now available for purchase in China. The company has released three variants of the flagship S1 model and three of the more affordable G1 model. They both connect to Alibaba's newly launched App, powered by the company's own AI tech, for AI assistance through voice commands and touch controls. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Google is upgrading its image generation model with new editing chops, higher resolutions, more accurate text rendering, and the ability to search the web. Also, the fire that broke out at the Oswego, NY Is the second major fire -- and the third overall -- in the last few months at the Novelis plant, Learn more about your ad choices. Visit podcastchoices.com/adchoices
The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
Joelle Pineau is the Chief Scientist at Cohere, where she leads research on advancing large language models and practical AI systems. Before joining Cohere, she was VP of AI Research at Meta, where she founded and led Meta AI's Montreal lab. A professor at McGill University, Joelle is renowned for her pioneering work in reinforcement learning, robotics, and responsible AI development. AGENDA: 00:00 Introduction to AI Scaling Laws 03:00 How Meta Shaped How I Think About AI Research 04:36 Challenges in Reinforcement Learning 10:00 Is It Possible to be Capital Efficient in AI 15:52 AI in Enterprise: Efficiency and Adoption 22:15 Security Concerns with AI Agents 28:34 Can Zuck Win By Buying the Galacticos of AI 32:15 The Rising Cost of Data 35:28 Synthetic Data and Model Degradation 37:22 Why AI Coding is Akin to Image Generation in 2015 48:46 If Joelle Was a VC Where Would She Invest? 52:17 Quickfire: Lessons from Zuck, Biggest Mindset Shift
This episode features the core team behind Sora, OpenAI's groundbreaking video generation platform that became the #1 app in the App Store. Bill Peebles (research lead), Rohan Sahai (product lead), and Thomas Dimson (engineering/product lead with Instagram background) discuss the unexpected viral success of Sora's launch, the product journey that led to the breakthrough "cameo" feature (putting yourself in AI-generated videos), and their philosophy of building a creator-first social network that prioritizes human creativity over passive consumption. They reveal the technical milestones in video generation, their small team size (under 50 people total at launch), navigation of content moderation challenges, early monetization strategy, and their ambitious vision for video models as world simulators that could eventually contribute to scientific breakthroughs by 2028. The conversation captures both the tactical product decisions and strategic philosophy that made Sora a cultural phenomenon. (0:00) Intro(1:35) Unexpected Success of ChatGPT and Sora(3:55) Sora as an Independent App(5:38) Sora Prototypes and Evolution(8:07) User Creativity and Surprising Use Cases(14:46) Celebrity Engagement and Rights Management(17:58) Competition and Future of AI Video Models(25:42) Empowering Creators(31:21) The Evolution of Image Generation(33:36) How Do Models Need to Improve?(42:10) Monetization of Sora(45:54) Global Reach and Cultural Impact(48:38) Moderation and Safety Challenges(50:09) Integration with Other OpenAI Products(52:07) How do Models Learn Physics?(55:16) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and SwiftEdit, which enable high-quality text-to-image generation and editing in a single inference step. He explains their novel distillation framework, where a multi-step teacher model guides the training of an efficient, single-step student model. We explore the architecture and training, including the use of a secondary 'coach' network that aligns the student's denoising function with the teacher's, allowing the model to bypass the iterative process entirely. Finally, we discuss how these efficiency breakthroughs pave the way for personalized on-device agents and the challenges of running reasoning models with techniques like inference-time scaling under a fixed compute budget. The complete show notes for this episode can be found at https://twimlai.com/go/753.
Have you ever run a Facebook ads test that completely defied logic? That's exactly what happened to us, and it led to one of the most eye-opening conversations we've had about Meta's new creative testing feature. In this episode, we explore the Meta AI tools for creative diversification.Lauren shares her real campaign results from a live test using Meta's latest Advantage+ creative tools. We also get into the details of how Meta's system distributes spend, why it doesn't always follow its own rules, and what you can do to avoid wasting money while still taking advantage of Meta's cutting-edge tools like AI-powered image generation and editing. If you're running ads for clients or scaling your own campaigns, you'll want to hear this one before you hit “publish” on your next creative test. We'll tell you what works, what breaks, and how to stay ahead of Meta's fast-moving automation. In this episode:- How Meta's in-campaign creative testing works- Step-by-step demo of Meta's creative testing interface- Using AI-generated ad images inside Ads Manager- Editing AI-generated images with custom prompts- Meta's ad enhancements: translate text, music, and site links- The “Flex Media” feature and its risks for client brandsResources Mentioned In The Episode:Previous Episodes on Meta's Andromeda Update: https://perpetualtraffic.com/?s=meta Watch the Episode on YouTube: www.perpetualtraffic.com/youtube Apply to Work With Tier 11: https://www.tiereleven.com/apply-now Watch Previous Live Ad Lab Sessions: tiereleven.com/youtube Creative Diversification Playbook, Practitioners' Guide From Meta: https://perpetualtraffic.com/wp-content/uploads/2025/10/Creative-Diversification-Playbook-Practitioner-Guidance.pdf Listen to this episode on your favorite podcast channel:Follow and listen on Apple: https://podcasts.apple.com/us/podcast/perpetual-traffic/id1022441491 Follow and listen on Spotify:https://open.spotify.com/show/59lhtIWHw1XXsRmT5HBAuK Subscribe and watch on YouTube: perpetualtraffic.com/youtube We appreciate your support!Visit our website: https://perpetualtraffic.com/ Follow us on X: https://x.com/perpetualtraf Connect with Ralph Burns: LinkedIn - https://www.linkedin.com/in/ralphburns Instagram -
In today's episode, we'll explore the features and functionality of Google Gemini's image generation tool, Nano Banana. Visit AVID Open Access to learn more.
The Healthtech Marketing Podcast presented by HIMSS and healthlaunchpad
This may be my favorite ever episode. And it's a doozy! I sat down with Nick Panayi, CMO of Inovalon, to explore one of the most comprehensive AI adoption stories I've encountered in healthcare technology marketing. Nick leads a 95-person marketing team at this 2,000-employee data and solutions company, and what they've accomplished in just two years is remarkable. According to our recent survey of the HealthTech Marketing community, only about 7% of companies have an AI strategy roadmap, let alone reach this level of AI integration. This makes Nick's insights particularly valuable for the rest of us on this journey.What impressed me most was how they've made AI accessible across their entire marketing organization. They've created a unified platform called Amaru AI that serves as a no-code environment where marketers can build workflows without technical expertise. By integrating AI tools directly into their existing Monday.com workflow system, they've met their team where they already work rather than forcing adoption of new platforms. Nick also shared fascinating use cases, including AI-powered voiceovers with Eleven Labs, conversational AI prospecting with Synthflow that makes outbound calls, and using Google's Notebook LLM to transform dense white papers into engaging podcast content. Key Topics Covered:"(00:00:00)" Introduction and Nick's Background"(00:03:00)" Building an AI-First Marketing Team"(00:07:00)" Overcoming Barriers - Legal and Team Concerns"(00:10:00)" The AI Roadmap and Tiger Team Approach"(00:12:00)" Scaling AI Adoption Across the Organization"(00:14:00)" Content Creation with AI Scoring"(00:18:00)" Automated Website Tagging and Personalization"(00:22:00)" Agentic AI for Paid Search Advertising"(00:30:00)" Amaru AI Platform Architecture"(00:35:00)" AI Integration in Monday.com Workflows"(00:36:00)" Image Generation and Remix Tools"(00:37:00)" Tega AI for Prospecting and Lead Generation"(00:40:00)" Synthflow - Conversational AI Calling"(00:45:00)" ROI and Measurement Approaches"(00:49:00)" AI Voiceovers and Video Production"(00:51:00)" AI Podcasting with Google Notebook LLM"(00:53:00)" Advice for Agencies and ConsultantsIn this post, I lay out a strategic roadmap that emulates what Nick and team have achieved at Inovalon.If you are interested in discussing this or any other topic, let's have a chat. Reach out to me directly to schedule a no-obligation discussion. This isn't a sales call, but rather an opportunity to talk through your questions and challenges.Follow me on LinkedIn.Subscribe to The HealthTech Marketing Show on Spotify or watch us on YouTube for more insights into marketing, AI, ABM, buyer journeys, and beyond!Thank you to our presenting sponsors, HIMSS, a leader in advancing health equity, digital innovation, and data-driven care through technology, policy, and community collaboration. And also HealthcareNOW, 24/7 expert shows, interviews, and podcasts, powering healthcare leaders with innovation, policy, and strategy insights.
Fill out this short listener survey to help us improve the show: https://forms.gle/bbcRiPTRwKoG2tJx8This week on Unsupervised Learning, Jacob sits down with Nicole Brichtova and Oliver Wang, the Google researchers behind "Nano Banana" - the breakthrough AI image model that achieved unprecedented character consistency and took over social media.The conversation covers how their model fits into creative workflows, why we're still in the early innings of image AI development despite impressive current capabilities, and how image and video generation are converging toward unified models. They also share honest perspectives on current limitations, safety approaches, and why the expectation of going from prompt to production-ready content is fundamentally overhyped.(0:00) Intro(1:42) Early Nano Banana Use Cases and Character Consistency(3:05) Popular Features and User Requests(3:54) Future Frontiers in Image Models(5:26) Personalization and Aesthetic Models(7:39) Model Success and User Engagement(10:59) Product Design for Different Users(19:30) Advanced Use Cases and Future Workflows(23:14) Editing Workflows and Chatbots(25:14) Google's Image Model Applications(27:12) Milestones in Image Generation(29:30) MidJourney's Success(30:54) Future of Image Models(33:55) Image Models vs. Video Models(36:35) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint
In this Marketing Over Coffee: Learn about Nano Banana, the AI Workshop, Pumpkin Spice and more! Direct Link to File Back to school and the sauna is closed Nano Banana is the new best at image generation and availible in Gemini Pumpkin Spice is Back! Join the workshop with Chris October 31 in London! 10:28 […] The post Image Generation, Branding, Editing Tools, and Billy Joel appeared first on Marketing Over Coffee Marketing Podcast.
AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Runway, Poe, Anthropic
In this episode, Conor Grennan and Jayden Schaefer explore the rapidly evolving landscape of AI image generation, focusing on new models like Google Gemini and partnerships like that of Meta and Mid Journey. They discuss the competitive nature of the market, the importance of distribution, and the innovative features of various image generation tools, including the speed and quality of outputs. The conversation highlights the potential applications of these technologies and the ongoing improvements in AI capabilities.AI Applied YouTube Channel: https://www.youtube.com/@AI-Applied-PodcastTry AI Box: https://aibox.aiConor's AI Course: https://www.ai-mindset.ai/coursesConor's AI Newsletter: https://www.ai-mindset.ai/Jaeden's AI Hustle Community: https://www.skool.com/aihustleYouTube Video:https://youtu.be/AOHt9bSkJcEChapters00:00 Introduction to AI Image Generation03:53 Competitive Landscape of Image Generators09:42 User Experience and Comparisons of AI Models12:38 Future of AI Image Generation and Closing Thoughts
Discover how to snap a photo of your product and instantly generate professional Amazon listing images and A+ content with AI. The future of e-commerce is here! ► Instagram: instagram.com/serioussellerspodcast ► Free Amazon Seller Chrome Extension: https://h10.me/extension ► Sign Up For Helium 10: https://h10.me/signup (Use SSP10 To Save 10% For Life) ► Learn How To Sell on Amazon: https://h10.me/ft ► Watch The Podcasts On YouTube: youtube.com/@Helium10/videos Join us for an exciting exploration of the future of AI image generation as we introduce you to a revolutionary era where ChatGPT-5 transforms how Amazon sellers create professional-quality images for their product listings. In this episode of the Serious Sellers Podcast by Helium 10, we're joined by AI expert Andrew Bell to discuss the latest advancements in AI technology. Learn how ChatGPT-5 combines the best features of previous versions to enhance image generation and listing creation, providing tools that could potentially replace traditional photography. Whether you're new to selling on Amazon or a seasoned veteran, discover how these innovations can optimize your listings with AI-generated visuals, avoiding costly photo shoots and improving your product presentation. Listen in as we explore the intricacies of maximizing image generation quality and prompts. Uncover the secrets behind ensuring your AI-generated images meet the highest standards by specifying "maximum quality" in prompts. We'll discuss strategies for using uploaded images as reference points and emphasize focusing on environment, mood, and composition. Andrew shares insights on a custom GPT model designed to generate precise prompts from product images, using examples like a water bottle to demonstrate its versatility. Additionally, learn about the use of Sora for generating multiple images simultaneously, enhancing productivity and creativity in your design process. Finally, we dive into the practical applications of custom GPT tools within ChatGPT for e-commerce optimization. Discover how to navigate the ChatGPT interface to access features like Sora for creating lifestyle images, and how the $20 monthly ChatGPT plan can generate multiple product images quickly and efficiently. We'll guide you through the process of using Helium 10 Audience tool's feature to test and identify the most effective product images for increasing click-through and conversion rates. This episode is packed with insights that promise to enhance your understanding of AI tools, offering cost-effective strategies to boost your e-commerce success.
For episode 584 of the BlockHash Podcast, host Brandon Zemp is joined by Péter W. Szabó, Founder of Tengr.ai, an image generation AI, which is now developing sophisticated AI systems that serve as professional partners for business and creative applications with a privacy-by-design approach. ⏳ Timestamps: (0:00) Introduction(1:08) Who is Péter W. Szabó?(10:25) What makes Tengr AI unique?(14:18) Tengr AI use-cases(17:14) Capital raise plans(19:38) Pricing tiers(27:20) Future of Gen AI(34:42) Tengr AI roadmap(36:00) AI events & conferences(37:19) Website, socials & community
Debating AI Intelligence and Corporate Utilization on Project Synapse In this episode of Project Synapse, hosts Jim Love, Marcel Gagne, and John Pinard delve into the ongoing debate about AI intelligence and the advancements in AI technology. They discuss the reliability and accuracy of AI models like GPT-5 and Grok, questioning whether these models are truly intelligent or merely sophisticated pattern matchers. The conversation also touches on the future of AI in both corporate settings and personal use, the ethical implications of AI development, and the different AI tools available for varying needs. The hosts explore the potential impact of AI on industries, user experiences, and the dynamics of technological dominance, particularly focusing on the impact of AI-driven changes in corporations like Apple and Google. 00:00 Introduction and AI Debate 00:23 Project Synapse Hosts and Apple Discussion 01:28 Apple's AI and Self-Driving Car Ventures 03:27 Grok's Unhinged Personality and AI Voice Modes 05:11 GPT-5 Rollout and User Reactions 12:08 AI Consistency and User Expectations 14:41 OpenAI's Market Position and User Experience 29:01 Grok's Capabilities and Image Generation 37:40 AI Personalities and Technological Advances 38:50 The Evolution of AI Tools 41:01 Super Intelligence and the Singularity 43:18 Debating AI's Future and Capabilities 46:52 Human vs. AI: Intelligence and Learning 52:59 Transhumanism and AI Integration 01:06:01 Corporate AI vs. Consumer AI 01:13:11 The Future of AI and Global Competition 01:14:29 Conclusion and Final Thoughts
Curious how AI actually turns text into images? In this episode, Addy and Joey break down the inner workings of AI image generation and explore ComfyUI. We'll explain the core concepts of latent space, diffusion models, and how noise becomes a recognizable image. From basic text-to-image workflows to advanced techniques with LoRAs and image-to-image transformations, discover when to use ComfyUI versus web-based tools like Runway or ChatGPT for your creative projects.---The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.
Co-hosts Mark Thompson and Steve Little examine the controversial rise of AI image "restoration" and discuss how entirely new images are being generated, rather than the original photos being restored. This is raising concerns about the preservation of authentic family photos.They discuss Mark's reconsideration of canceling his Perplexity subscription after rediscovering its unique strengths for supporting research.The hosts analyze recent court rulings that permit AI training on legally acquired content, plus Disney's ongoing case against Midjourney.This week's Tip of the Week explores how project workspaces in ChatGPT and Claude can greatly simplify your genealogical research.In RapidFire, the hosts cover Meta's aggressive AI hiring spree, the proliferation of AI tools in everyday software, including a new genealogy transcription tool from Dan Maloney, and the importance of reading AI news critically.Timestamps:In the News:06:50 The Pros and Cons of "Restoring" Family Photos with AI23:58 Mark is Cancelling Perplexity... Maybe32:33 AI Copyright Cases Are Starting to Work Their Way Through the CourtsTip of the Week:40:09 How Project Workspaces Help Genealogists Stay OrganizedRapidFire:48:51 Meta Goes on a Hiring Spree56:09 AI Is Everywhere!01:06:00 Reading AI News ResponsiblyResource LinksOpenAI: Introducing 4o Image Generation https://openai.com/index/introducing-4o-image-generation/Perplexity https://www.perplexity.ai/How does Perplexity work? https://www.perplexity.ai/help-center/en/articles/10352895-how-does-perplexity-workAnthropic wins key US ruling on AI training in authors' copyright lawsuit https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/Meta wins AI copyright lawsuit as US judge rules against authors https://www.theguardian.com/technology/2025/jun/26/meta-wins-ai-copyright-lawsuit-as-us-judge-rules-against-authorsDisney, Universal sue image creator Midjourney for copyright infringement https://www.reuters.com/business/media-telecom/disney-universal-sue-image-creator-midjourney-copyright-infringement-2025-06-11/Disney and Universal Sue A.I. Firm for Copyright Infringement https://www.nytimes.com/2025/06/11/business/media/disney-universal-midjourney-ai.htmlProjects in ChatGPThttps://help.openai.com/en/articles/10169521-projects-in-chatgptMeta shares hit all-time high as Mark Zuckerberg goes on AI hiring blitz https://www.cnbc.com/2025/06/30/meta-hits-all-time-mark-zuckerberg-ai-blitz.htmlHere's What Mark Zuckerberg Is Offering Top AI Talent https://www.wired.com/story/mark-zuckerberg-meta-offer-top-ai-talent-300-million/Genealogy Assistant AI Handwritten Text Recognition Tool https://www.genea.ca/htr-tool/Borland Genetics https://borlandgenetics.com/Illusion of Thinking https://machinelearning.apple.com/research/illusion-of-thinkingSimon Willison: Seven replies to the viral Apple reasoning paper -- and why they fall short https://simonwillison.net/2025/Jun/15/viral-apple-reasoning-paper/MIT: Your Brain on ChatGPT https://www.media.mit.edu/projects/your-brain-on-chatgpt/overview/MIT researchers say using ChatGPT can rot your brain. The truth is a little more complicated https://theconversation.com/mit-researchers-say-using-chatgpt-can-rot-your-brain-the-truth-is-a-little-more-complicated-259450Guiding Principles for Responsible AI in Genealogy https://craigen.org/TagsArtificial Intelligence, Genealogy, Family History, AI Tools, Image Generation, AI Ethics, Perplexity, ChatGPT, Claude, Meta, Copyright Law, AI Training, Photo Restoration, Project Management, AI Development, Research Tools, Responsible AI Use, GRIP, AI News Analysis, Vibe Coding, Coalition for Responsible AI in Genealogy, AI Hiring, Dan Maloney, Handwritten Text Recognition
The 2025 generative AI image market is a trade-off between aesthetic quality, instruction-following, and user control. This episode analyzes the key platforms, comparing Midjourney's artistic output against the superior text generation and prompt adherence of GPT-4o and Imagen 4, the commercial safety of Adobe Firefly, and the total customization of Stable Diffusion. Links Notes and resources at ocdevel.com/mlg/mla-25 Try a walking desk - stay healthy & sharp while you learn & code Build the future of multi-agent software with AGNTCY. The State of the Market The market is split by three core philosophies: The "Artist" (Midjourney): Prioritizes aesthetic excellence and cinematic output, sacrificing precise user control and instruction following. The "Collaborator" (GPT-4o, Imagen 4): Extensions of LLMs that excel at conversational co-creation, complex instruction following, and integration into productivity workflows. The "Sovereign Toolkit" (Stable Diffusion): An open-source engine offering users unparalleled control, customization, and privacy in exchange for technical engagement. Table 1: 2025 Generative AI Image Tool At-a-Glance Comparison Tool Parent Company Access Method(s) Pricing Core Strength Best For Midjourney v7 Midjourney, Inc. Web App, Discord Subscription Artistic Aesthetics & Photorealism Fine Art, Concept Design, Stylized Visuals GPT-4o OpenAI ChatGPT, API Freemium/Sub Conversational Control & Instruction Following Marketing Materials, UI/UX Mockups, Logos Google Imagen 4 Google Gemini, Workspace, Vertex AI Freemium/Sub Ecosystem Integration & Speed Business Presentations, Educational Content Stable Diffusion 3 Stability AI Local Install, Web UIs, API Open Source Ultimate Customization & Control Developers, Power Users, Bespoke Workflows Adobe Firefly Adobe Creative Cloud Apps, Web App Subscription Commercial Safety & Workflow Integration Professional Designers, Agencies, Enterprise Core Platforms Midjourney v7: Premium choice for artistic quality. Features: Web UI with Draft Mode, user personalization, emerging video/3D. Weaknesses: Poor text generation, poor prompt adherence, public images on cheap plans, no API/bans automation. OpenAI GPT-4o: An intelligent co-creator for controlled generation. Features: Conversational refinement, superior text rendering, understands uploaded image context. Weaknesses: Slower than competitors, generates one image at a time, strict content filters. Google Imagen 4: Pragmatic tool focused on speed and ecosystem integration. Features: High-quality photorealism, fast generation, strong text rendering, multilingual. Weaknesses: Less artistic flair; value is dependent on Google ecosystem investment. Stable Diffusion 3: Open-source engine for maximum user control. Features: MMDiT architecture improves prompt/text handling, scalable models, vast ecosystem (LoRAs/ControlNet). Weaknesses: Steep learning curve, quality is user-dependent. Adobe Firefly: Focused on commercial safety and professional workflow integration. Features: Trained on Adobe Stock for legal indemnity, Generative Fill/Expand tools. Weaknesses: Creative range limited by training data, requires Adobe subscription/credits. Tools and Concepts In-painting: Modifying a masked area inside an image. Out-painting: Extending an image beyond its original borders. LoRA (Low-Rank Adaptation): A small file that applies a fine-tuned style, character, or concept to a base model. ControlNet: Uses a reference image (e.g., pose, sketch) to enforce the composition, structure, or pose of the output. A1111 vs. ComfyUI: Two main UIs for Stable Diffusion. A1111 is a beginner-friendly tabbed interface; ComfyUI is a node-based interface for complex, efficient, and automated workflows. Workflows "Best of Both Worlds": Generate aesthetic base images in Midjourney, then composite, edit, and add text with precision in Photoshop/Firefly. Single-Ecosystem: Work entirely within Adobe Creative Cloud or Google Workspace for seamless integration, commercial safety (Adobe), and convenience (Google). "Build Your Own Factory": Use ComfyUI to build automated, multi-step pipelines for consistent character generation, advanced upscaling, and video. Decision Framework Choose by Goal: Fine Art/Concept Art: Midjourney. Logos/Ads with Text: GPT-4o, Google Imagen 4, or specialist Ideogram. Consistent Character in Specific Pose: Stable Diffusion with a Character LoRA and ControlNet (OpenPose). Editing/Expanding an Existing Photo: Adobe Photoshop with Firefly. Exclusion Rules: If you need legible text, exclude Midjourney. If you need absolute privacy or zero cost (post-hardware), Stable Diffusion is the only option. If you need guaranteed commercial legal safety, use Adobe Firefly. If you need an API for a product, use OpenAI or Google; automating Midjourney is a bannable offense.
OpenAI's new o3 model feels almost criminal to use. Google is legit giving away its Gemini AI for free to millions. And Microsoft legit released an AI agent that can use a computer. Sheesh. Week after week, the pace of AI innovation is getting harder and harder to keep up with. Don't worry. We do that for you. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Thoughts on this? Join the convo.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:OpenAI's $3B WindSurf Acquisition DealGoogle Launches Gemini 2.5 Flash ModelGoogle Veo 2 Video Tool ReleaseMicrosoft AI Computer Use Agent LaunchUS Ban Consideration on Chinese DeepSeekFree Gemini Advanced to US StudentsAnthropic's Claude Adds Google WorkspaceOpenAI Testing New Social Media PlatformGPT 4.1 API Model with 1M ContextOpenAI's O3 and O4 Mini ReleasedTimestamps:00:00 Intro03:43 OpenAI Eyes $3B Windsurf Acquisition06:54 Google Launches Gemini 2.5 Flash11:36 Google Unveils Veo 2 for Videos15:49 "AI Market Tensions: US vs China"20:10 Microsoft Unveils AI Automation Tool21:09 Microsoft AI Enhances Business Automation28:16 Claude's New Tool: Pricey Research Integration29:31 OpenAI Teams Lacks Gmail Integration33:10 OpenAI Testing Social Media Platform39:18 GPT-4.1's Competitive Edge in Coding43:19 AI Model Versions Overview45:49 Agentic AI: Workflow Evolution47:45 "O Four Mini Model Overview"52:20 Tech Giants Unveil AI ToolsKeywords:Microsoft, Autonomous AI Agent, Apps, Websites, Thinking Models, OpenAI, Large Language Model Modes, Google, Gemini AI, Tens of Millions, Claude, Anthropic, AI News, AI World, Grow Companies, Grow Careers, Generative AI, Acquisition, Windsurf, $3 Billion, Code Generation Market, AnySphere, Cursor, Annualized Recurring Revenue, AI Coding Startups, Codium, Competitive AI Space, Google Next Conference, Gemini 2.5 Flash, AI Model, Computational Reasoning, Pricing, Output Tokens, Reasoning Budget, Complex Problem Solving, Performance Benchmarks, Competitors, Claude 3.7, DeepSeek, OpenAI o4, AI Studio, AI-powered Video, Vios AI, v o two, Text-to-Video, Synth ID, Digital Watermark, WISC anime, White House Restrictions, NVIDIA AI Chips, Intellectual Property Rights, Trump Administration, Silicon Valley, DeepSeek Ban, Innovations in AI, Copilot Studio, Microsoft 365 Copilot, Automation, API Restrictions, AI Agents, Influencer Recommendations, Social Media Network, Sam Altman, ChatGPT, Image Generation, Grok AI Integration, SociSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Ready for ROI on GenAI? Go to youreverydayai.com/partner
OpenAI's o3 and o4-mini are here—and they're multimodal, cheaper, and scary good. These models can see, code, plan, and use tools all on their own. Yeah. It's a big deal. We break down everything from tool use to image reasoning to why o3 might be the start of something actually autonomous. Plus, our favorite cursed (and adorable) 4o Image Generation prompts, ChatGPT as a social network, the old (Monday) news about GPT-4.1 including free Windsurf coding for a week! Also, Kling 2.0 and Veo 2 drop new AI video models, Google's Deepmind is using AI to talk to dolphins, NVIDIA's new chip restrictions and Eric Schmidt says the computers… don't have to listen to us anymore. Uh-oh. THE COMPUTERS HAVE EYES. AND THEY MIGHT NOT NEED US. STILL A GOOD SHOW. Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // O3 + o4-MINI ARE HERE LIVE STREAM: https://www.youtube.com/live/sq8GBPUb3rk?si=qQMFAvm8UmvyGaWv OpenAI Blog Post: https://openai.com/index/introducing-o3-and-o4-mini/ “Thinking With Images” https://openai.com/index/thinking-with-images/ Codex CLI https://x.com/OpenAIDevs/status/1912556874211422572 Professor & Biomedical Scientist Reaction to o3 https://x.com/DeryaTR_/status/1912558350794961168 Linda McMahon's A1 vs AI https://www.usatoday.com/story/news/politics/2025/04/12/linda-mcmahon-a1-instead-of-ai/83059797007/ GPT-4.1 in the API https://openai.com/index/gpt-4-1/ GPT-4.1 Reduces The Need to Read Unneccesary Files https://www.reddit.com/r/singularity/comments/1jz600b/one_of_the_most_important_bits_of_the_stream_if/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button OpenAI Might Acquire WIndsurf for 3 Billion Dollars https://www.cnbc.com/2025/04/16/openai-in-talks-to-pay-about-3-billion-to-acquire-startup-windsurf.html ChatGPT: The Social Network https://x.com/kyliebytes/status/1912171286039793932 New ChatGPT Image Library https://chatgpt.com/library 4o Image Gen Prompts We Love Little Golden Books https://x.com/AIForHumansShow/status/1912321209297191151 Make your pets people https://x.com/gavinpurcell/status/1911243562928447721 Barbie https://x.com/AIForHumansShow/status/1910514568595726414 Coachella Port-a-potty https://x.com/AIForHumansShow/status/1911604534713192938 Ex-Google CEO Says The Computers Are Improving Fast https://www.reddit.com/r/artificial/comments/1jzw6bd/eric_schmidt_says_the_computers_are_now/ Kling 2.0 https://x.com/Kling_ai/status/1912040247023788459 Rotisserie Chicken Knight Prompt in Kling 2.0: https://x.com/AIForHumansShow/status/1912170034761531817 Kling example that didn't work that well: https://x.com/AIForHumansShow/status/1912298707955097842 Veo 2 Launched in AI Studio https://aistudio.google.com/generate-video https://blog.google/products/gemini/video-generation/ James Cameron on “Humans as a Model” https://x.com/dreamingtulpa/status/1910676179918397526 Nvidia Restricting More Chip Sales To China https://www.nytimes.com/2025/04/15/technology/nvidia-h20-chip-china-restrictions.html $500 Billion for US Chip Manufacturing https://www.cnbc.com/2025/04/14/nvidia-to-mass-produce-ai-supercomputers-in-texas.html Dolphin Gemma: AI That Will Understand Dolphins https://x.com/GoogleDeepMind/status/1911767367534735832 Jason Zada's Very Cool Veo 2 Movie https://x.com/jasonzada/status/1911812014059733041 Robot Fire Extinguisher https://x.com/CyberRobooo/status/1911665518765027788
Kevin Weil is the chief product officer at OpenAI, where he oversees the development of ChatGPT, enterprise products, and the OpenAI API. Prior to OpenAI, Kevin was head of product at Twitter, Instagram, and Planet, and was instrumental in the development of the Libra (later Novi) cryptocurrency project at Facebook.In this episode, you'll learn:1. How OpenAI structures its product teams and maintains agility while developing cutting-edge AI2. The power of model ensembles—using multiple specialized models together like a company of humans with different skills3. Why writing effective evals (AI evaluation tests) is becoming a critical skill for product managers4. The surprisingly enduring value of chat as an interface for AI, despite predictions of its obsolescence5. How “vibe coding” is changing how companies operate6. What OpenAI looks for when hiring product managers (hint: high agency and comfort with ambiguity)7. “Model maximalism” and why today's AI is the worst you'll ever use again8. Practical prompting techniques that improve AI interactions, including example-based prompting—Brought to you by:• Eppo—Run reliable, impactful experiments• Persona—A global leader in digital identity verification• OneSchema—Import CSV data 10x faster—Where to find Kevin Weil:• X: https://x.com/kevinweil• LinkedIn: https://www.linkedin.com/in/kevinweil/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Kevin's background(04:06) OpenAI's new image model(06:52) The role of chief product officer at OpenAI(10:18) His recruitment story and joining OpenAI(17:20) The importance of evals in AI(24:59) Shipping quickly and consistently(28:34) Product reviews and iterative deployment(39:35) Chat as an interface for AI(43:59) Collaboration between researchers and product teams(46:41) Hiring product managers at OpenAI(48:45) Embracing ambiguity in product management(51:41) The role of AI in product teams(53:21) Vibe coding and AI prototyping(55:55) The future of product teams and fine-tuned models(01:04:36) AI in education(01:06:42) Optimism and concerns about AI's future(01:16:37) Reflections on the Libra project(01:20:37) Lightning round and final thoughts—Referenced:• OpenAI: https://openai.com/• The AI-Generated Studio Ghibli Trend, Explained: https://www.forbes.com/sites/danidiplacido/2025/03/27/the-ai-generated-studio-ghibli-trend-explained/• Introducing 4o Image Generation: https://openai.com/index/introducing-4o-image-generation/• Waymo: https://waymo.com/• X: https://x.com• Facebook: https://www.facebook.com/• Instagram: https://www.instagram.com/• Planet: https://www.planet.com/• Sam Altman on X: https://x.com/sama• A conversation with OpenAI's CPO Kevin Weil, Anthropic's CPO Mike Krieger, and Sarah Guo: https://www.youtube.com/watch?v=IxkvVZua28k• OpenAI evals: https://github.com/openai/evals• Deep Research: https://openai.com/index/introducing-deep-research/• Ev Williams on X: https://x.com/ev• OpenAI API: https://platform.openai.com/docs/overview• Dwight Eisenhower quote: https://www.brainyquote.com/quotes/dwight_d_eisenhower_164720• Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons• StackBlitz: https://stackblitz.com/• Claude 3.5 Sonnet: https://www.anthropic.com/news/claude-3-5-sonnet• Anthropic: https://www.anthropic.com/• Four-minute mile: https://en.wikipedia.org/wiki/Four-minute_mile• Chad: https://chatgpt.com/g/g-3F100ZiIe-chad-open-a-i• Dario Amodei on LinkedIn: https://www.linkedin.com/in/dario-amodei-3934934/• Figma: https://www.figma.com/• Julia Villagra on LinkedIn: https://www.linkedin.com/in/juliavillagra/• Andrej Karpathy on X: https://x.com/karpathy• Silicon Valley CEO says ‘vibe coding' lets 10 engineers do the work of 100—here's how to use it: https://fortune.com/2025/03/26/silicon-valley-ceo-says-vibe-coding-lets-10-engineers-do-the-work-of-100-heres-how-to-use-it/• Cursor: https://www.cursor.com/• Windsurf: https://codeium.com/windsurf• GitHub Copilot: https://github.com/features/copilot• Patrick Srail on X: https://x.com/patricksrail• Khan Academy: https://www.khanacademy.org/• CK-12 Education: https://www.ck12.org/• Sora: https://openai.com/sora/• Sam Altman's post on X about creative writing: https://x.com/sama/status/1899535387435086115• Diem (formerly known as Libra): https://en.wikipedia.org/wiki/Diem_(digital_currency)• Novi: https://about.fb.com/news/2020/05/welcome-to-novi/• David Marcus on LinkedIn: https://www.linkedin.com/in/dmarcus/• Peter Zeihan on X: https://x.com/PeterZeihan• The Wheel of Time on Prime Video: https://www.amazon.com/Wheel-Time-Season-1/dp/B09F59CZ7R• Top Gun: Maverick on Prime Video: https://www.amazon.com/Top-Gun-Maverick-Joseph-Kosinski/dp/B0DM2LYL8G• Thinking like a gardener not a builder, organizing teams like slime mold, the adjacent possible, and other unconventional product advice | Alex Komoroske (Stripe, Google): https://www.lennysnewsletter.com/p/unconventional-product-advice-alex-komoroske• MySQL: https://www.mysql.com/—Recommended books:• Co-Intelligence: Living and Working with AI: https://www.amazon.com/Co-Intelligence-Living-Working-Ethan-Mollick/dp/059371671X• The Accidental Superpower: Ten Years On: https://www.amazon.com/Accidental-Superpower-Ten-Years/dp/1538767341• Cable Cowboy: https://www.amazon.com/Cable-Cowboy-Malone-Modern-Business/dp/047170637X—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
Google's AI efforts & Gemini Pro 2.5 take a major step forward with updates to Deep Research, new Agent2Agent protocol (A2A) & more. Sadly, OpenAI teases o3 and o4 but delays GPT-5. Plus, Meta's new Llama 4 models are out but have issues, Midjourney v7's debut, John Carmack's smackdown of an AI video game engine hater, Gavin's deep dive into OpenAI 4o Image Generation formats & the weirdest robot horse concept you've ever seen. WE'RE DEEP RESEARCHING OUR ENTIRE LIVES RIGHT NOW Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // Google Cloud 25 Live Stream “A New Way To Cloud!” https://youtu.be/Md4Fs-Zc3tg Google Cloud Blog Post https://blog.google/products/google-cloud/next-2025/ Upgraded Deep Research Out Preforms OpenAI Deep Research https://x.com/GeminiApp/status/1909721519724339226 Google's Deep Research Vs OpenAI Deep Research https://x.com/testingcatalog/status/1909727195402027183 New Ironwood TPUs https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/ Gavin's Experiences Google Gemini Deep Research: Baltro Test: https://x.com/AIForHumansShow/status/1909813850817675424 KP Biography: https://g.co/gemini/share/7b7bdb2c400e Agent2Agent Protocol https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ Google Paying Some AI Stuff To Do Nothing Rather Than Work For Rivals https://x.com/TechCrunch/status/1909368948862181584 Solar Glow Meditations on AI http://tiktok.com/@solarglowmeditations/video/7491038509214518559?_t=ZT-8vNNgF7QpyM&_r=1 o4-mini & o3 coming before GPT-5 in shift from Sam Altman https://x.com/sama/status/1908167621624856998 OpenAI Strategic Deployment Team (new role to prep for AGI) https://x.com/aleks_madry/status/1909686225658695897 AI 2027 Paper https://ai-2027.com/ Llama 4 is here… but how good is it? https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Controversy Around Benchmarks: https://gizmodo.com/meta-cheated-on-ai-benchmarks-and-its-a-glimpse-into-a-new-golden-age-2000586433 Deep dive on issues from The Information https://www.theinformation.com/articles/llama-4s-rocky-debut?rc=c3oojq&shared=3bbd9f72303888e2 Midjourney v7 Is Here and it's… just ok? https://www.midjourney.com/updates/v7-alpha John Carmack Defends AI Video Games https://x.com/ID_AA_Carmack/status/1909311174845329874 Tim Sweeney Weighs In https://x.com/TimSweeneyEpic/status/1909314230391902611 New Test-time-training = 1 Min AI Video From a Single Prompt https://x.com/karansdalal/status/1909312851795411093 Kawasaki's Robot Horse Concept https://futurism.com/the-byte/kawasaki-rideable-horse-robot VIDEO: https://youtu.be/vQDhzbTz-9k?si=2aWMtZVLnMONEjBe Engine AI + iShowSpeed https://x.com/engineairobot/status/1908570512906740037 Gemini 2.5 Pro Plays Pokemon https://x.com/kiranvodrahalli/status/1909699142265557208 Prompt-To-Anything Minecraft Looking Game https://x.com/NicolasZu/status/1908882267453239323 An Image That Will Never Go Viral https://www.reddit.com/r/ChatGPT/comments/1jth5yf/asked_for_an_image_that_will_never_go_viral/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button How Toothpaste Is Made https://www.reddit.com/r/aivideo/comments/1jujzh2/how_toothpaste_is_made/ 90s Video Game 4o Image Gen Prompt https://x.com/AIForHumansShow/status/1908985288116101553 1980s Japanese Posters https://x.com/AIForHumansShow/status/1909824824677192140 Buff Superbad https://x.com/AIForHumansShow/status/1909402225488937065
Ep. 316 Why spend $25K on marketing strategies when you can get the same value for free? Kipp dives into how Google Gemini 2.5 can revolutionize your marketing strategy without costing a dime. Learn more on using deep research to write effective prompts, creating comprehensive marketing campaigns, and utilizing AI models to build out detailed marketing assets with ease. Mentions Curious how we used AI to skyrocket our conversion by 82%? Find out here: https://clickhubspot.com/vcr Want Kipp's prompt? Get it here: https://clickhubspot.com/swv Gemini 2.5 https://gemini.google.com/ OpenAI 4o Image Generation https://openai.com/index/introducing-4o-image-generation/ ChatGPT https://chatgpt.com/ Check out this episode on YouTube: https://www.youtube.com/watch?v=49uoV2k6btA&t=4s We're creating our next round of content and want to ensure it tackles the challenges you're facing at work or in your business. To understand your biggest challenges we've put together a survey and we'd love to hear from you! https://bit.ly/matg-research Resource [Free] Steal our favorite AI Prompts featured on the show! Grab them here: https://clickhubspot.com/aip We're on Social Media! Follow us for everyday marketing wisdom straight to your feed YouTube: https://www.youtube.com/channel/UCGtXqPiNV8YC0GMUzY-EUFg Twitter: https://twitter.com/matgpod TikTok: https://www.tiktok.com/@matgpod Join our community https://landing.connect.com/matg Thank you for tuning into Marketing Against The Grain! Don't forget to hit subscribe and follow us on Apple Podcasts (so you never miss an episode)! https://podcasts.apple.com/us/podcast/marketing-against-the-grain/id1616700934 If you love this show, please leave us a 5-Star Review https://link.chtbl.com/h9_sjBKH and share your favorite episodes with friends. We really appreciate your support. Host Links: Kipp Bodnar, https://twitter.com/kippbodnar Kieran Flanagan, https://twitter.com/searchbrat ‘Marketing Against The Grain' is a HubSpot Original Podcast // Brought to you by Hubspot Media // Produced by Darren Clarke.
OpenAI's servers melted down as it opened up ChatGPT 4o Image Generation to everyone… and then nerfed it hard. Thankfully, their new raise of $40 BILLION DOLLARS will help. We dive deeper into all the cool ways people are using it (and aren't able to use it any more) along with news about OpenAI Academy & how GPT-4.5 beat the Turing Test. Then Runway Gen-4 is here and it's VERY good, OmniHuman-1 does AI lip sync right, Meta's new Hypernova AR glasses are revealed, Anthropic takes on the black box of AI & we take another deep dive at AI Big Booty Bears (not really). IT'S A WEEK OF NERFING BUT IT'S ALL FINE. REALLY GOOD SHOW. Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // We (Well, really 4o Image Gen) Broke ChatGPT https://x.com/sama/status/1906210479695126886 https://x.com/sama/status/1907098207467032632 One million new users in an hour https://x.com/sama/status/1906771292390666325 300 Billion Dollar Valuation Secured https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?unlocked_article_code=1.8U4.YiBx.kTmo-VHPXyQl&smid=url-share It Def Got Nerfed Somewhat https://x.com/blizaine/status/1905701063866581368 Our experience https://x.com/AIForHumansShow/status/1906492795420086275 OAI Working on Refusals Etc https://x.com/joannejang/status/1907174171790197204 Finding Our Way Around Nerfing: https://x.com/AIForHumansShow/status/1906594620147916974 Cocktail Peanut Step-by-step Adding Things To Cup https://x.com/cocktailpeanut/status/1906983829035974890 Three Layer Parallax Background For Vibe Coded Games https://x.com/majidmanzarpour/status/1906896279646425263 GPT 4.5 (with persona) passes the turing test https://x.com/camrobjones/status/1907086860322480233 New Open Weight Model from OpenAI on the way https://x.com/sama/status/1906793591944646898 Depressing new voice for April Fool's Day (Monday) https://x.com/OpenAI/status/1907124258867982338 OpenAI Academy https://academy.openai.com Runway Gen-4 https://x.com/runwayml/status/1906718935778545964 Knight From 4o Image Gen https://x.com/AIForHumansShow/status/1906808191838765175 The Alpha Returns https://x.com/AIForHumansShow/status/1906882011408863458 Higgsfield AI https://x.com/higgsfield_ai/status/1906748655702265901 Omni-Human 1 Out In the Wild https://x.com/TomLikesRobots/status/1907163287596519442 https://dreamina.capcut.com/ Meta's new Movie Grade Talking Synthesis https://x.com/_akhaliq/status/1906935462075236621 https://congwei1230.github.io/MoCha/ Meta's New Hypernova Glasses https://www.theverge.com/news/641153/meta-hypernova-ray-ban-smart-glasses-price Anthropic Makes Progress on AI Not Being a “Black Box” https://x.com/AnthropicAI/status/1905303835892990278 Alexa+ Out, Soooooorta https://www.theverge.com/news/639697/amazon-alexa-plus-launch-early-access-missing-features Celebrity Mortal Kombat 2025 Edition https://x.com/n_reruns/status/1906725609587593286 What if Humans & AI Unite? https://youtu.be/vp7xoPeWzEw?si=6yYRUzCvBJSGcNYz Fantastic AI Film Done In Three Days Plus Walkthrough https://x.com/iaveras/status/1906362437487534296 Emoji Drop - Fun Vibe Coded Music Toy https://x.com/alexanderchen/status/1907052205988851795 Gavin's 4o Image Gen Video https://x.com/AIForHumansShow/status/1905390375604551939
Join me as I chat with Jacob Posel where he explains his viral method for creating professional ads using ChatGPT 4o and reference images. Jacob demonstrates that effective ad creation with AI relies more on providing quality inspiration images than complex prompting. The episode includes four example ads and two live demonstrations, showcasing how marketers can experiment with different audiences and styles at scale to improve conversion rates.Timestamps:• 00:00 - Intro• 02:01 - Best practices and the importance of inspiration images• 05:52 - Using Reference Images vs. No Reference • 09:07 - Exploring Sora for Prompt Inspiration• 13:33 - Another example of reference image + product image• 16:20 - More Ad Examples (Ridge Wallet)• 18:14 - More Ad Examples (Replit)• 19:12 - Where to find Ad inspiration• 20:00 - Live Ad Creation for LCA• 33:37 - Ad Concept Development for LCA• 37:37 - Finalizing the Ad and Next StepsKey Points:• Jacob Posel demonstrates how to create high-quality ads in minutes using ChatGPT, focusing on using reference images rather than complex prompts• The process involves providing inspiration ads and product images to ChatGPT, which can then generate customized, conversion-ready advertisements• Using well-known brands requires less specific prompting as the AI already understands them, while lesser-known products need more reference images• The speakers demonstrate live ad creation for Late Checkout Agency (LCA), showing the iterative process and prompt refinement techniques1) The Secret to Great AI-Generated AdsForget complex prompting - the REAL magic is in your reference images!Jacob revealed: "The important thing in creating good ads are the inspiration images and product images that you provide."ChatGPT understands intent better than other AI image tools.2) The Simple FrameworkHere's the basic formula Jacob uses:• Find an inspiration ad you like• Add your product image• Give simple instructions• Let ChatGPT work its magicThe model is SMART enough to understand what you want without elaborate prompts.3) When to Use Reference Images vs. When Not ToKEY INSIGHT: ChatGPT already knows popular brands!"It knows Nike, Adidas, Ridge wallets really well," Jacob explains.For well-known brands: Simple prompts work fineFor newer/smaller brands: ALWAYS provide reference imagesThis saves you tons of time!4) Pro Tip: Where to Find Ad InspirationJacob's go-to sources for ad inspiration:• Creative OS• Foreplay• IconThese tools pull from Meta's ad library and make it searchable.Finding the right reference ad is HALF the battle!5) The Hidden Goldmine Nobody's Talking AboutALPHA ALERT: Check out Sora's explore page!"This is a piece of alpha that people do not know about right now," Jacob revealed.It shows you the EXACT prompts people used for successful images.Perfect for learning how to prompt effectively!6) Crucial Prompting TipsIf your first attempt isn't perfect:• DON'T continue in the same chat• Start a fresh conversation• Make adjustments to your prompt there"You generally get through a cycle of worse and worse rather than improving" if you keep going in the same chat.7) Real-World Example: The LCA AdWe created a luxury-style ad for Greg's innovation agency in real-time!Process:1. Screenshot website elements2. Find vintage Rolex ad as inspiration3. Simple prompt explaining the brand vibe4. Iterate in a NEW chat for improvementsTook less than 10 minutes! 8) The Future of AdvertisingThis technology enables:• Personalized ads at scale• Rapid response to trending moments• Testing different audiences/demographics• Experimenting with creative directionsAs Jacob puts it: "The ad is the targeting" - and now you can do it at SCALE.Notable Quotes:"Marketers have this saying nowadays that the ad is the targeting and this is just a way that you can actually do it at scale." - Jacob Posel"The important thing I found in creating good ads are actually the inspiration images and the product images that you provide." - Jacob PoselLCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/BoringAds — ads agency that will build you profitable ad campaigns http://boringads.com/BoringMarketing — SEO agency and tools to get your organic customers http://boringmarketing.com/Startup Empire - a membership for builders who want to build cash-flowing businesses https://www.startupempire.coFIND ME ON SOCIALX/Twitter: https://twitter.com/gregisenbergInstagram: https://instagram.com/gregisenberg/LinkedIn: https://www.linkedin.com/in/gisenberg/FIND JACOB ON SOCIALX/Twitter: https://x.com/jacob_posel
The AI Breakdown: Daily Artificial Intelligence News and Discussions
New aesthetic filters on movies and websites. A new era of books. An absolutely bulldozer to the ad industry process. NLW builds off of comments from Balaji Srinivasan to explore ten areas that are being transformed by the new ChatGPT image generation model. Source:https://x.com/balajis/status/1904987087361004029Brought to you by:KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdown
This week on the AI Rollup, we're diving into hyper-realistic AI-generated podcast hosts, showcasing Hedera Studio's cutting-edge character tech. Google drops Gemini 2.0 Flash, making AI image generation even more powerful with improved consistency and text integration. Meanwhile, China is pushing hard—Baidu's latest model beats GPT-4.5 at half the cost, challenging OpenAI's dominance. Plus, Sesame open-sources its voice model, and the crypto AI market sees brutal drawdowns. Despite the turbulence, teams like Virtuals and Bittensor keep building, while Pluralis raises $7.6M to decentralize AI model training. Where does AI Crypto go from here? Let's find out.------
Sun, 02 Feb 2025 22:00:00 GMT http://relay.fm/mpu/782 http://relay.fm/mpu/782 Apple Intelligence Review 782 David Sparks and Stephen Hackett Stephen and David have been using Apple's AI features since they showed up in betas last year. Today, they share their findings and talk about what's worth your time — and what's not — when it comes to Apple Intelligence. Stephen and David have been using Apple's AI features since they showed up in betas last year. Today, they share their findings and talk about what's worth your time — and what's not — when it comes to Apple Intelligence. clean 5850 Stephen and David have been using Apple's AI features since they showed up in betas last year. Today, they share their findings and talk about what's worth your time — and what's not — when it comes to Apple Intelligence. This episode of Mac Power Users is sponsored by: Squarespace: Save 10% off your first purchase of a website or domain using code MPU. Indeed: Join more than 3.5 million businesses worldwide using Indeed to hire great talent fast. Links and Show Notes: Sign up for the MPU email newsletter and join the MPU forums. More Power Users: Ad-free episodes with regular bonus segments Submit Feedback Announcing the 2025 Productivity Field Guide - MacSparky Productivity Field Guide 2025 (Standard Edition) | MacSparky Field Productivity Field Guide 2025 (Plus Edition) | MacSparky Field Guides Mac Power Users #781: The 2025 Productivity Field Guide - Relay A New Look for 2025 and Beyond — Relay Apple Intelligence - Apple Introducing Apple Intelligence for iPhone, iPad, and Mac - Apple How to get Apple Intelligence - Apple Support Privacy - Features - Apple Apple Intelligence and privacy on iPhone - Apple Support ChatGPT Apple Intelligence Extension & Privacy- Apple Cleft Notes - Capture and Share Notes With Cleft's AI Scribe Use Writing Tools with Apple Intelligence on iPhone - Apple Support Apple Intelligence: Examples of Content Generated with Writing Tools — 512 Pixels Grammarly: Free AI Writing Assistance Co-Intelligence: Living and Working with AI - Amazon Books Apple Intelligence: Examples of Image Generation — 512 Pixels Create Genmoji with Apple Intelligence on iPhone - Apple Support Create original images with Image Playground on iPhone - Apple Support Use Image Wand with Apple Intelligence on iPhone - Apple Support Use Apple Intelligence in Photos on iPhone - Apple Support Get webpage summaries with Apple Intelligence on iPhone - Apple Support Use Apple Intelligence in Mail on iPhone - Apple Support Use Apple Intelligence in Messages on iPhone - Apple Support Get a summary of a phone call or audio recording on iPhone with Apple Intelligence - Apple Support Summarize notifications and reduce interruptions with Apple Intelligence on iPhone - Apple Support Apple Intelligence botched a notification summary about Luigi Mangione, and the BBC isn't happy - 9to5Mac iOS 18.3 makes 5 changes to Apple Intelligence notification summaries - 9to5Mac Use ChatGPT with Apple Intelligence on iPhone - Apple Support Using ChatGPT on Your iPhone