Podcasts about gemini pro

  • 83PODCASTS
  • 131EPISODES
  • 50mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Aug 18, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about gemini pro

Latest podcast episodes about gemini pro

Using the Whole Whale Podcast
“10 blue links” era is over, Create AI-Resistant Content | Avinash Kaushik

Using the Whole Whale Podcast

Play Episode Listen Later Aug 18, 2025 54:26


Nonprofits, your “10 blue links” era is over. In this episode, Avinash Kaushik (Human-Made Machine; Occam's Razor) breaks down Answer Engine Optimization—why LLMs now decide who gets seen, why third-party chatter outweighs your own site, and what to do about it. We get tactical: build AI-resistant content (genuine novelty + depth), go multimodal (text, video, audio), and stamp everything with real attribution so bots can't regurgitate you into sludge. We also cover measurement that isn't delusional—group your AEO referrals, expect fewer visits but higher intent, and stop worshiping last-click and vanity metrics. Avinash updates the 10/90 rule for the AI age (invest in people, plus “synthetic interns”), and torpedoes linear funnels in favor of See-Think-Do-Care anchored in intent. If you want a blunt, practical playbook for staying visible—and actually converting—when answers beat searches, this is it. About Avinash Avinash Kaushik is a leading voice in marketing analytics—the author of Web Analytics: An Hour a Day and Web Analytics 2.0, publisher of the Marketing Analytics Intersect newsletter, and longtime writer of the Occam's Razor blog. He leads strategy at Human Made Machine, advises Tapestry on brand strategy/marketing transformation, and previously served as Google's Digital Marketing Evangelist. Uniquely, he donates 100% of his book royalties and paid newsletter revenue to charity (civil rights, early childhood education, UN OCHA; previously Smile Train and Doctors Without Borders). He also co-founded Market Motive. Resource Links Avinash Kaushik — Occam's Razor (site/home) Occam's Razor by Avinash Kaushik Marketing Analytics Intersect (newsletter sign-up) Occam's Razor by Avinash Kaushik AEO series starter: “AI Age Marketing: Bye SEO, Hello AEO!” Occam's Razor by Avinash Kaushik See-Think-Do-Care (framework explainer) Occam's Razor by Avinash Kaushik Books: Web Analytics: An Hour a Day | Web Analytics 2.0 (author pages) Occam's Razor by Avinash Kaushik+1 Human Made Machine (creative pre-testing) — Home | About | Products humanmademachine.com+2humanmademachine.com+2 Tapestry (Coach, Kate Spade) (company site) Tapestry Tools mentioned (AEO measurement): Trakkr (AI visibility / prompts / sentiment) Trakkr Evertune (AI Brand Index & monitoring) evertune.ai GA4 how-tos (for your AEO channel + attribution): Custom Channel Groups (create an “AEO” channel) Google Help Attribution Paths report (multi-touch view) Google Help Nonprofit vetting (Avinash's donation diligence): Charity Navigator (ratings) Charity Navigator Google for Nonprofits — Gemini & NotebookLM (AI access) Announcement / overview | Workspace AI for nonprofits blog.googleGoogle Help Example NGO Avinash supports: EMERGENCY (Italy) EMERGENCY Transcript Avinash Kaushik: [00:00:00] So traffic's gonna go down. So if you're a business, you're a nonprofit, how. Do you deal with the fact that you're gonna lose a lot of traffic that you get from a search engine? Today, when all of humanity moves to the answer Engine W world, only about two or 3% of the people are doing it. It's growing very rapidly. Um, and so the art of answer engine optimization is making sure that we are building for these LMS and not getting stuck with only solving for Google with the old SEO techniques. Some of them still work, but you need to learn a lot of new stuff because on average, organic traffic will drop between 16 to 64% negative and paid search traffic will drop between five to 30% negative. And that is a huge challenge. And the reason you should start with AEO now ​ George Weiner: [00:01:00] This week's guest, Avinash Kaushik is an absolute hero of mine because of his amazing, uh, work in the field of web analytics. And also, more importantly, I'd say education. Avinash Kaushik, , digital marketing evangelist at Google for Google Analytics. He spent 16 years there. He basically is. In the room where it happened, when the underlying ability to understand what's going on on our websites was was created. More importantly, I think for me, you know, he joined us on episode 45 back in 2016, and he still is, I believe, on the cutting edge of what's about to happen with AEO and the death of SEO. I wanna unpack that 'cause we kind of fly through terms [00:02:00] before we get into this podcast interview AEO. Answer engine optimization. It's this world of saying, alright, how do we create content that can't just be, , regurgitated by bots, , wholesale taken. And it's a big shift from SEO search engine optimization. This classic work of creating content for Google to give us 10 blue links for people to click on that behavior is changing. And when. We go through a period of change. I always wanna look at primary sources. The people that, , are likely to know the most and do the most. And he operates in the for-profit world. But make no mistake, he cares deeply about nonprofits. His expertise, , has frankly been tested, proven and reproven. So I pay attention when he says things like, SEO is going away, and AEO is here to stay. So I give you Avan Kashic. I'm beyond excited that he has come back. He was on our 45th episode and now we are well over our 450th episode. So, , who knows what'll happen next time we talk to him. [00:03:00] This week on the podcast, we have Avinash Kaushik. He is currently the chief strategy officer at Human Made Machine, but actually returning guest after many, many years, and I know him because he basically introduced me to Google Analytics, wrote the literal book on it, and also helped, by the way. No big deal. Literally birth Google Analytics for everyone. During his time at Google, I could spend the entire podcast talking about, uh, the amazing amounts that you have contributed to, uh, marketing and analytics. But I'd rather just real quick, uh, how are you doing and how would you describe your, uh, your role right now? Avinash Kaushik: Oh, thank you. So it's very excited to be back. Um, look forward to the discussion today. I do, I do several things concurrently, of course. I, I, I am an author and I write this weekly newsletter on marketing and analytics. Um, I am the Chief Strategy Officer at Human Made Machine, a company [00:04:00] that obsesses about helping brands win before they spend by doing creative pretesting. And then I also do, uh, uh, consulting at Tapestry, which owns Coach and Kate Spades. And my work focuses on brand strategy and marketing transformation globally. George Weiner: , Amazing. And of course, Occam's Razor. The, the, yes, the blog, which is incredible. I happen to be a, uh, a subscriber. You know, I often think of you in the nonprofit landscape, even though you operate, um, across many different brands, because personally, you also actually donate all of your proceeds from your books, from your blog, from your subscription. You are donating all of that, um, because that's just who you are and what you do. So I also look at you as like team nonprofit, though. Avinash Kaushik: You're very kind. No, no, I, I, yeah. All the proceeds from both of my books and now my newsletter, premium newsletter. It's about $200,000 a year, uh, donated to nonprofits, and a hundred [00:05:00] percent of the revenue is donated nonprofit, uh, nonprofits. And, and for me, it, it's been ai. Then I have to figure out. Which ones, and so I research nonprofits and I look up their cha charity navigators, and I follow up with the people and I check in on the works while, while don't work at a nonprofit, but as a customer of nonprofits, if you will. I, I keep sort of very close tabs on the amazing work that these charities do around the world. So feel very close to the people that you work with very closely. George Weiner: So recently I got an all caps subject line from you. Well, not from you talking about this new acronym that was coming to destroy the world, I think is what you, no, AEO. Can you help us understand what answer engine optimization is? Avinash Kaushik: Yes, of course. Of course. We all are very excited about ai. Obviously you, you, you would've to live in. Some backwaters not to be excited about it. And we know [00:06:00] that, um, at the very edge, lots of people are using large language models, chat, GPT, Claude, Gemini, et cetera, et cetera, in the world. And, and increasingly over the last year, what you have begun to notice is that instead of using a traditional search engine like Google or using the old Google interface with the 10 blue links, et cetera. People are beginning to use these lms. They just go to chat, GPT to get the answer that they want. And the one big difference in this, this behavior is I actually have on September 8th, I have a keynote here in New York and I have to be in Shanghai the next day. That is physically impossible because it, it just, the time it takes to travel. But that's my thing. So today, if I wanted to figure out what is the fastest way. On September 8th, I can leave New York and get to Shanghai. I would go to Google flights. I would put in the destinations. It will come back with a crap load of data. Then I poke and prod and sort and filter, and I have to figure out which flight is right for that. For this need I have. [00:07:00] So that is the old search engine world. I'm doing all the work, hunting and pecking, drilling down, visiting websites, et cetera, et cetera. Instead, actually what I did is I went to charge GBT 'cause I, I have a plus I, I'm a paying member of charge GBT and I said to charge GBTI have to do a keynote between four and five o'clock on September 8th in New York and I have to be in Shanghai as fast as I possibly can be After my keynote, can you find me the best flight? And I just typed in those two sentences. He came back and said, this Korean airline website flight is the best one for you. You will not get to your destination on time until, unless you take a private jet flight for $300,000. There is your best option. They're gonna get to Shanghai on, uh, September 10th at 10 o'clock in the morning if you follow these steps. And so what happened there? I didn't have to hunt and pack and dig and go to 15 websites to find the answer I wanted. The engine found the [00:08:00] answer I wanted at the end and did all the work for me that you are seeing from searching, clicking, clicking, clicking, clicking, clicking to just having somebody get you. The final answer is what I call the, the, the underlying change in consumer behavior that makes answer engine so exciting. Obviously, it creates a challenge for us because what happened between those two things, George is. I didn't have to visit many websites. So traffic is going down, obviously, and these interfaces at the moment don't have paid search links for now. They will come, they will come, but they don't at the moment. So traffic's gonna go down. So if you're a business, you're a nonprofit, how. Do you deal with the fact that you're gonna lose a lot of traffic that you get from a search engine? Today, when all of humanity moves to the answer Engine W world, only about two or 3% of the people are doing it. It's growing very rapidly. Um, and so the art of answer engine optimization [00:09:00] is making sure that we are building for these LMS and not getting stuck with only solving for Google with the old SEO techniques. Some of them still work, but you need to learn a lot of new stuff because on average, organic traffic will drop between 16 to 64% negative and paid search traffic will drop between five to 30% negative. And that is a huge challenge. And the reason you should start with AEO now George Weiner: that you know. Is a window large enough to drive a metaphorical data bus through? And I think talk to your data doctor results may vary. You are absolutely right. We have been seeing this with our nonprofit clients, with our own traffic that yes, basically staying even is the new growth. Yeah. But I want to sort of talk about the secondary implications of an AI that has ripped and gripped [00:10:00] my website's content. Then added whatever, whatever other flavors of my brand and information out there, and is then advising somebody or talking about my brand. Can you maybe unwrap that a little bit more? What are the secondary impacts of frankly, uh, an AI answering what is the best international aid organization I should donate to? Yes. As you just said, you do Avinash Kaushik: exactly. No, no, no. This such a, such a wonderful question. It gets to the crux. What used to influence Google, by the way, Google also has an answer engine called Gemini. So I just, when I say Google, I'm referring to the current Google that most people use with four paid links and 10 SEO links. So when I say Google, I'm referring to that one. But Google also has an answer engine. I, I don't want anybody saying Google does is not getting into the answer engine business. It is. So Google is very much influenced by content George that you create. I call it one P content, [00:11:00] first party content. Your website, your mobile app, your YouTube channel, your Facebook page, your, your, your, your, and it sprinkles on some amount of third party content. Some websites might have reviews about you like Yelp, some websites might have PR releases about you light some third party content. Between search engine and engines. Answer Engines seem to overvalue third party content. My for one p content, my website, my mobile app, my YouTube channel. My, my, my, everything actually is going down in influence while on Google it's pretty high. So as here you do SEO, you're, you're good, good ranking traffic. But these LLMs are using many, many, many, literally tens of thousands more sources. To understand who you are, who you are as a nonprofit, and it's [00:12:00] using everybody's videos, everybody's Reddit posts, everybody's Facebook things, and tens of thousands of more people who write blogs and all kinds of stuff in order to understand who you are as a nonprofit, what services you offer, how good you are, where you're falling short, all those negative reviews or positive reviews, it's all creepy influence. Has gone through the roof, P has come down, which is why it has become very, very important for us to build a new content strategy to figure out how we can influence these LMS about who we are. Because the scary thing is at this early stage in answer engines, someone else is telling the LLMs who you are instead of you. A more, and that's, it feels a little scary. It feels as scary as a as as a brand. It feels very scary as I'm a chief strategy officer, human made machine. It feels scary for HMM. It feels scary for coach. [00:13:00] It's scary for everybody, uh, which is why you really urgently need to get a handle on your content strategy. George Weiner: Yeah, I mean, what you just described, if it doesn't give you like anxiety, just stop right now. Just replay what we just did. And that is the second order effects. And you know, one of my concerns, you mentioned it early on, is that sort of traditional SEO, we've been playing the 10 Blue Link game for so long, and I'm worried that. Because of the changes right now, roughly what 20% of a, uh, search is AI overview, that number's not gonna go down. You're mentioning third party stuff. All of Instagram back to 2020, just quietly got tossed into the soup of your AI brand footprint, as we call it. Talk to me about. There's a nonprofit listening to this right now, and then probably if they're smart, other organizations, what is coming in the next year? They're sitting down to write the same style of, you know, [00:14:00] ai, SEO, optimized content, right? They have their content calendar. If you could have like that, I'm sitting, you're sitting in the room with them. What are you telling that classic content strategy team right now that's about to embark on 2026? Avinash Kaushik: Yes. So actually I, I published this newsletter just last night, and this is like the, the fourth in my AEO series, uh, newsletter, talks about how to create your content portfolio strategy. Because in the past we were like, we've got a product pages, you know, the equivalent of our, our product pages. We've got some, some, uh, charitable stories on our website and uh, so on and so forth. And that's good. That's basic. You need to do the basics. The interesting thing is you need to do so much more both on first party. So for example, one of the first things to appreciate is LMS or answer engines are far more influenced by multimodal content. So what does that mean? Text plus [00:15:00] video plus audio. Video and audio were also helpful in Google. And remember when I say Google, I'm referring to the old linky linking Google, not Gemini. But now video has ton more influence. So if you're creating a content strategy for next year, you should say many. Actually, lemme do one at a time. Text. You have to figure out more types of things. Authoritative Q and as. Very educational deep content around your charity's efforts. Lots of text. Third. Any seasonality, trends and patterns that happen in your charity that make a difference? I support a school in, in Nepal and, and during the winter they have very different kind of needs than they do during the summer. And so I bumped into this because I was searching about something seasonality related. This particular school for Tibetan children popped up in Nepal, and it's that content they wrote around winter and winter struggles and coats and all this stuff. I'm like. [00:16:00] It popped up in the answer engine and I'm like, okay. I research a bit more. They have good stories about it, and I'm supporting them q and a. Very, very important. Testimonials. Very, very important interviews. Very, very important. Super, super duper important with both the givers and the recipients, supporters of your nonprofit, but also the recipient recipients of very few nonprofits actually interview the people who support them. George Weiner: Like, why not like donors or be like, Hey, why did you support us? What was the, were the two things that moved you from Aware to care? Avinash Kaushik: Like for, for the i I Support Emergency, which is a Italian nonprofit like Ms. Frontiers and I would go on their website and speak a fiercely about why I absolutely love the work they do. Content, yeah. So first is text, then video. You gotta figure out how to use video a lot more. And most nonprofits are not agile in being able to use video. And the third [00:17:00] thing that I think will be a little bit of a struggle is to figure out how to use audio. 'cause audio also plays a very influential role. So for as you are planning your uh, uh, content calendar for the next year. Have the word multimodal. I'm sorry, it's profoundly unsexy, but put multimodal at the top, underneath it, say text, then say video, then audio, and start to fill those holes in. And if those people need ideas and example of how to use audio, they should just call you George. You are the king of podcasting and you can absolutely give them better advice than I could around how nonprofits could use audio. But the one big thing you have to think about is multimodality for next year George Weiner: that you know, is incredibly powerful. Underlying that, there's this nuance that I really want to make sure that we understand, which is the fact that the type of content is uniquely different. It's not like there's a hunger organization listening right now. It's not 10 facts about hunger during the winter. [00:18:00] Uh, days of being able to be an information resource that would then bring people in and then bring them down your, you know, your path. It's game over. If not now, soon. Absolutely. So how you are creating things that AI can't create and that's why you, according to whom, is what I like to think about. Like, you're gonna say something, you're gonna write something according to whom? Is it the CEO? Is it the stakeholder? Is it the donor? And if you can put a attribution there, suddenly the AI can't just lift and shift it. It has to take that as a block and be like, no, it was attributed here. This is the organization. Is that about right? Or like first, first party data, right? Avinash Kaushik: I'll, I'll add one more, one more. Uh, I'll give a proper definition. So, the fir i I made 11 recommendations last night in the newsletter. The very first one is focus on creating AI resistant content. So what, what does that mean? AI resistant means, uh, any one of us from nonprofits could [00:19:00] open chat, GPT type in a few queries and chat. GD PT can write our next nonprofit newsletter. It could write the next page for our donation. It could create the damn page for our donation, right? Remember, AI can create way more content than you can, but if you can use AI to create content, 67 million other nonprofits are doing the same thing. So what you have to do is figure out how to build AI resistant content, and my definition is very simple. George, what is AI resistance? It's content of genuine novelty. So to tie back to your recommendation, your CEO of a nonprofit that you just recommended, the attribution to George. Your CEO has a unique voice, a unique experience. The AI hasn't learned what makes your CEO your frontline staff solving problems. You are a person who went and gave a speech at the United Nations on behalf of your nonprofit. Whatever you are [00:20:00] doing is very special, and what you have to figure out is how to get out of the AI slop. You have to get out of all the things that AI can automatically type. Figure out if your content meets this very simple, standard, genuine novelty and depth 'cause it's the one thing AI isn't good at. That's how you rank higher. And not only will will it, will it rank you, but to make another point you made, George, it's gonna just lift, blanc it out there and attribute credit to you. Boom. But if you're not genuine, novelty and depth. Thousand other nonprofits are using AI to generate text and video. Could George Weiner: you just, could you just quit whatever you're doing and start a school instead? I seriously can't say it enough that your point about AI slop is terrifying me because I see it. We've built an AI tool and the subtle lesson here is that think about how quickly this AI was able to output that newsletter. Generic old school blog post and if this tool can do it, which [00:21:00] by the way is built on your local data set, we have the rag, which doesn't pause for a second and realize if this AI can make it, some other AI is going to be able to reproduce it. So how are you bringing the human back into this? And it's a style of writing and a style of strategic thinking that please just start a school and like help every single college kid leaving that just GPT their way through a degree. Didn't freaking get, Avinash Kaushik: so it's very, very important to make sure. Content is of genuine novelty and depth because it cannot be replicated by the ai. And by the way, this, by the way, George, it sounds really high, but honestly to, to use your point, if you're a CEO of a nonprofit, you are in it for something that speaks to you. You're in it. Because ai, I mean nonprofit is not your path to becoming the next Bill Gates, you're doing it because you just have this hair. Whoa, spoiler alert. No, I'm sorry. [00:22:00] Maybe, maybe that is. I, I didn't, I didn't mean any negative emotion there, but No, I love it. It's all, it's like a, it's like a sense of passion you are bringing. There's something that speaks to you. Just put that on paper, put that on video, put that on audio, because that is what makes you unique. And the collection of those stories of genuine depth and novelty will make your nonprofit unique and stand out when people are looking for answers. George Weiner: So I have to point to the next elephant in the room here, which is measurement. Yes. Yes. Right now, somebody is talking about human made machine. Someone's talking about whole whale. Someone's talking about your nonprofit having a discussion in an answer engine somewhere. Yes. And I have no idea. How do I go about understanding measurement in this new game? Avinash Kaushik: I have. I have two recommendations. For nonprofits, I would recommend a tool called Tracker ai, TRA, KKR [00:23:00] ai, and it has a free version, that's why I'm recommending it. Some of the many of these tools are paid tools, but with Tracker, do ai. It allows you to identify your website, URL, et cetera, et cetera, and it'll give you some really wonderful and fantastic, helpful report It. Tracker helps you understand prompt tracking, which is what are other people writing about you when they're seeking? You? Think of this, George, as your old webmaster tools. What keywords are people using to search? Except you can get the prompts that people are using to get a more robust understanding. It also monitors your brand's visibility. How often are you showing up and how often is your competitor showing up, et cetera, et cetera. And then he does that across multiple search engines. So you can say, oh, I'm actually pretty strong in OpenAI for some reason, and I'm not that strong in Gemini. Or, you know what, I have like the highest rating in cloud, but I don't have it in OpenAI. And this begins to help you understand where your current content strategy is working and where it is not [00:24:00] working. So that's your brand visibility. And the third thing that you get from Tracker is active sentiment tracking. This is the scary part because remember, you and I were both worried about what other people saying about us. So this, this are very helpful that we can go out and see what it is. What is the sentiment around our nonprofit that is coming across in, um, in these lms? So Tracker ai, it have a free and a paid version. So I would, I would recommend using it for these three purposes. If, if you have funding to invest in a tool. Then there's a tool called Ever Tool, E-V-E-R-T-U-N-E Ever. Tune is a paid tool. It's extremely sophisticated and robust, and they do brand monitoring, site audit, content strategy, consumer preference report, ai, brand index, just the. Step and breadth of metrics that they provide is quite extensive, but, but it is a paid tool. It does cost money. It's not actually crazy expensive, but uh, I know I have worked with them before, so full disclosure [00:25:00] and having evaluated lots of different tools, I have sort of settled on those two. If it's a enterprise type client I'm working with, then I'll use Evert Tune if I am working with a nonprofit or some of my personal stuff. I'll use Tracker AI because it's good enough for a person that is, uh, smaller in size and revenue, et cetera. So those two tools, so we have new metrics coming, uh, from these tools. They help us understand the kind of things we use webmaster tools for in the past. Then your other thing you will want to track very, very closely is using Google Analytics or some other tool on your website. You are able to currently track your, uh, organic traffic and if you're taking advantage of paid ads, uh, through a grant program on Google, which, uh, provides free paid search credits to nonprofits. Then you're tracking your page search traffic to continue to track that track trends, patterns over time. But now you will begin to see in your referrals report, in your referrals report, you're gonna begin to seeing open [00:26:00] ai. You're gonna begin to see these new answer engines. And while you don't know the keywords that are sending this traffic and so on and so forth, it is important to keep track of the traffic because of two important reasons. One, one, you want to know how to highly prioritize. AEO. That's one reason. But the other reason I found George is syn is so freaking hard to rank in an answer engine. When people do come to my websites from Answer engine, the businesses I work with that is very high intent person, they tend to be very, very valuable because they gave the answer engine a very complex question to answer the answers. Engine said you. The right answer for it. So when I show up, I'm ready to buy, I'm ready to donate. I'm ready to do the action that I was looking for. So the percent of people who are coming from answer engines to your nonprofit carry significantly higher intention, and coming from Google, who also carry [00:27:00] intent. But this man, you stood out in an answer engine, you're a gift from God. Person coming thinks you're very important and is likely to engage in some sort of business with you. So I, even if it's like a hundred people, I care a lot about those a hundred people, even if it's not 10,000 at the moment. Does that make sense George? George Weiner: It does, and I think, I'm glad you pointed to, you know, the, the good old Google Analytics. I'm like, it has to be a way, and I, I think. I gave maximum effort to this problem inside of Google Analytics, and I'm still frustrated that search console is not showing me, and it's just blending it all together into one big soup. But. I want you to poke a hole in this thinking or say yes or no. You can create an AI channel, an AEO channel cluster together, and we have a guide on that cluster together. All of those types of referral traffic, as you mentioned, right from there. I actually know thanks to CloudFlare, the ratios of the amount of scrapes versus the actual clicks sent [00:28:00] for roughly 20, 30% of. Traffic globally. So is it fair to say I could assume like a 2% clickthrough or a 1% clickthrough, or even worse in some cases based on that referral and then reverse engineer, basically divide those clicks by the clickthrough rate and essentially get a rough share of voice metric on that platform? Yeah. Avinash Kaushik: So, so for, um, kind of, kind of at the moment, the problem is that unlike Google giving us some decent amount of data through webmaster tools. None of these LLMs are giving us any data. As a business owner, none of them are giving us any data. So we're relying on third parties like Tracker. We're relying on third parties like Evert Tune. You understand? How often are we showing up so we could get a damn click through, right? Right. We don't quite have that for now. So the AI Brand Index in Evert Tune comes the closest. Giving you some information we could use in the, so your thinking is absolutely right. Your recommendation is ly, right? Even if you can just get the number of clicks, even if you're tracking them very [00:29:00] carefully, it's very important. Please do exactly what you said. Make the channel, it's really important. But don't, don't read too much into the click-through rate bits, because we're missing the. We're missing a very important piece of information. Now remember when Google first came out, we didn't have tons of data. Um, and that's okay. These LLMs Pro probably will realize over time if they get into the advertising business that it's nice to give data out to other people, and so we might get more data. Until then, we are relying on these third parties that are hacking these tools to find us some data. So we can use it to understand, uh, some of the things we readily understand about keywords and things today related to Google. So we, we sadly don't have as much visibility today as we would like to have. George Weiner: Yeah. We really don't. Alright. I have, have a segment that I just invented. Just for you called Avanade's War Corner. And in Avanade's War Corner, I noticed that you go to war on various concepts, which I love because it brings energy and attention to [00:30:00] frankly data and finding answers in there. So if you'll humor me in our war corner, I wanna to go through some, some classic, classic avan. Um, all right, so can you talk to me a little bit about vanity metrics, because I think they are in play. Every day. Avinash Kaushik: Absolutely. No, no, no. Across the board, I think in whatever we do. So, so actually I'll, I'll, I'll do three. You know, so there's vanity metrics, activity metrics and outcome metrics. So basically everything goes into these three buckets essentially. So vanity metrics are, are the ones that are very easy to find, but them moving up and down has nothing to do with the number of donations you're gonna get as a nonprofit. They're just there to ease our ego. So, for example. Let's say we are a nonprofit and we run some display ads, so measure the number of impressions that were delivered for our display ad. That's a vanity metric. It doesn't tell you anything. You could have billions of impressions. You could have 10 impressions, doesn't matter, but it is easily [00:31:00] available. The count is easily available, so we report it. Now, what matters? What matters are, did anybody engage with the ad? What were the percent of people who hovered on the ad? What were the number of people who clicked on the ad activity metrics? Activity metrics are a little more useful than vanity metrics, but what does it matter for you as a non nonprofit? The number of donations you received in the last 24 hours. That's an outcome metric. Vanity activity outcome. Focus on activity to diagnose how well our campaigns or efforts are doing in marketing. Focus on outcomes to understand if we're gonna stay in business or not. Sorry, dramatic. The vanity metrics. Chasing is just like good for ego. Number of likes is a very famous one. The number of followers on a social paia, a very famous one. Number of emails sent is another favorite one. There's like a whole host of vanity metrics that are very easy to get. I cannot emphasize this enough, but when you unpack and or do meta-analysis of [00:32:00] relationship between vanity metrics and outcomes, there's a relationship between them. So we always advise people that. Start by looking at activity metrics to help you understand the user's behavior, and then move to understanding outcome metrics because they are the reason you'll thrive. You will get more donations or you will figure out what are the things that drive more donations. Otherwise, what you end up doing is saying. If I post provocative stuff on Facebook, I get more likes. Is that what you really wanna be doing? But if your nonprofit says, get me more likes, pretty soon, there's like a naked person on Facebook that gets a lot of likes, but it's corrupting. Yeah. So I would go with cute George Weiner: cat, I would say, you know, you, you get the generic cute cat. But yeah, same idea. The Internet's built on cats Avinash Kaushik: and yes, so, so that's why I, I actively recommend people stay away from vanity metrics. George Weiner: Yeah. Next up in War Corner, the last click [00:33:00] fallacy, right? The overweighting of this last moment of purchase, or as you'd maybe say in the do column of the See, think, do care. Avinash Kaushik: Yes. George Weiner: Yes. Avinash Kaushik: So when the, when the, when we all started to get Google Analytics, we got Adobe Analytics web trends, remember them, we all wanted to know like what drove the conversion. Mm-hmm. I got this donation for a hundred dollars. I got a donation for a hundred thousand dollars. What drove the conversion. And so what lo logically people would just say is, oh, where did this person come from? And I say, oh, the person came from Google. Google drove this conversion. Yeah, his last click analysis just before the conversion. Where did the person come from? Let's give them credit. But the reality is it turns out that if you look at consumer behavior, you look at days to donation, visits to donation. Those are two metrics available in Google. It turns out that people visit multiple times before [00:34:00] they make a donation. They may have come through email, their interest might have been triggered through your email. Then they suddenly remembered, oh yeah, yeah, I wanted to go to the nonprofit and donate something. This is Google, you. And then Google helps them find you and they come through. Now, who do you give credit Email or the Google, right? And what if you came 5, 7, 8, 10 times? So the last click fallacy is that it doesn't allow you to see the full consumer journey. It gives credit to whoever was the last person who sent you this, who introduced this person to your website. And so very soon we move to looking at what we call MTI, Multi-Touch Attribution, which is a free solution built into Google. So you just go to your multichannel funnel reports and it will help you understand that. One, uh, 150 people came from email. Then they came from Google. Then there was a gap of nine days, and they came back from Facebook and then they [00:35:00] converted. And what is happening is you're beginning to understand the consumer journey. If you understand the consumer journey better, we can come with better marketing. Otherwise, you would've said, oh, close shop. We don't need as many marketing people. We'll just buy ads on Google. We'll just do SEO. We're done. Oh, now you realize there's a more complex behavior happening in the consumer. They need to solve for email. You solve for Google, you need to solve Facebook. In my hypothetical example, so I, I'm very actively recommend people look at the built-in free MTA reports inside the Google nalytics. Understand the path flow that is happening to drive donations and then undertake activities that are showing up more often in the path, and do fewer of those things that are showing up less in the path. George Weiner: Bring these up because they have been waiting on my mind in the land of AEO. And by the way, we're not done with war. The war corner segment. There's more war there's, but there's more, more than time. But with both of these metrics where AEO, if I'm putting these glasses back on, comes [00:36:00] into play, is. Look, we're saying goodbye to frankly, what was probably somewhat of a vanity metric with regard to organic traffic coming in on that 10 facts about cube cats. You know, like, was that really how we were like hanging our hat at night, being like. Job done. I think there's very much that in play. And then I'm a little concerned that we just told everyone to go create an AEO channel on their Google Analytics and they're gonna come in here. Avinash told me that those people are buyers. They're immediately gonna come and buy, and why aren't they converting? What is going on here? Can you actually maybe couch that last click with the AI channel inbound? Like should I expect that to be like 10 x the amount of conversions? Avinash Kaushik: All we can say is it's, it's going to be people with high intention. And so with the businesses that I'm working with, what we are finding is that the conversion rates are higher. Mm. This game is too early to establish any kind of sense of if anybody has standards for AEO, they're smoking crack. Like the [00:37:00] game is simply too early. So what we I'm noticing is that in some cases, if the average conversion rate is two point half percent, the AEO traffic is converting at three, three point half. In two or three cases, it's converting at six, seven and a half. But there is not enough stability in the data. All of this is new. There's not enough stability in the data to say, Hey, definitely you can expect it to be double or 10% more or 50% more. We, we have no idea this early stage of the game, but, but George, if we were doing this again in a year, year and a half, I think we'll have a lot more data and we'll be able to come up with some kind of standards for, for now, what's important to understand is, first thing is you're not gonna rank in an answer engine. You just won't. If you do rank in an answer engine, you fought really hard for it. The person decided, oh my God, I really like this. Just just think of the user behavior and say, this person is really high intent because somehow [00:38:00] you showed up and somehow they found you and came to you. Chances are they're caring. Very high intent. George Weiner: Yeah. They just left a conversation with a super intelligent like entity to come to your freaking 2001 website, HTML CSS rendered silliness. Avinash Kaushik: Whatever it is, it could be the iffiest thing in the world, but they, they found me and they came to you and they decided that in the answer engine, they like you as the answer the most. And, and it took that to get there. And so all, all, all is I'm finding in the data is that they carry higher intent and that that higher intent converts into higher conversion rates, higher donations, as to is it gonna be five 10 x higher? It's unclear at the moment, but remember, the other reason you should care about it is. Every single day. As more people move away from Google search engines to answer engines, you're losing a ton of traffic. If somebody new showing up, treat them with, respect them with love. Treat them with [00:39:00] care because they're very precious. Just lost a hundred. Check the landing George Weiner: pages. 'cause you may be surprised where your front door is when complexity is bringing them to you, and it's not where you spent all of your design effort on the homepage. Spoiler. That's exactly Avinash Kaushik: right. No. Exactly. In fact, uh, the doping deeper into your websites is becoming even more prevalent with answer engines. Mm-hmm. Um, uh, than it used to be with search engines. The search always tried to get you the, the top things. There's still a lot of diversity. Your homepage likely is still only 30% of your traffic. Everybody else is landing on other homepage or as you call them, landing pages. So it's really, really important to look beyond your homepage. I mean, it was true yesterday. It's even truer today. George Weiner: Yeah, my hunch and what I'm starting to see in our data is that it is also much higher on the assisted conversion like it is. Yes. Yes, it is. Like if you have come to us from there, we are going to be seeing you again. That's right. That's right. More likely than others. It over indexes consistently for us there. Avinash Kaushik: [00:40:00] Yes. Again, it ties back to the person has higher intent, so if they didn't convert in that lab first session, their higher intent is gonna bring them back to you. So you are absolutely right about the data that you're seeing. George Weiner: Um, alright. War corner, the 10 90 rule. Can you unpack this and then maybe apply it to somebody who thinks that their like AI strategy is done? 'cause they spend $20 or $200 a month on some tool and then like, call it a day. 'cause they did ai. Avinash Kaushik: Yes, yes. No, it's, it's good. I, I developed it in context of analytics. When I was at my, uh, job at Intuit, I used to, I was at Intuit, senior director for research and analytics. And one of the things I found is people would consistently spend lots of money on tools in that time, web analytics tools, research tools, et cetera. And, uh, so they're spending a contract of a few hundred thousand dollars or hundreds of thousands of dollars, and then they give it to a fresh graduate to find insights. [00:41:00] I was like, wait, wait, wait. So you took this $300,000 thing and gave it to somebody. You're paying $45,000 a year. Who is young in their career, young in their career, and expecting them to make you tons of money using this tool? It's not the tool, it's the human. And so that's why I developed the the 10 90 rule, which is that if you have a, if you have a hundred dollars to invest in making smarter decisions, invest $10 in the tool, $90 in the human. We all have access to so much data, so much complexity. The world is changing so fast that it is the human that is going to figure out how to make sense of these insights rather than the tool magically spewing and understanding your business enough to tell you exactly what to do. So that, that's sort of where the 10 90 rule came from. Now, sort of we are in this, in this, um, this is very good for nonprofits by the way. So we're in this era. Where On the 90 side? No. So the 10, look, don't spend insane money on tools that is just silly. So don't do that. Now the 90, let's talk about the [00:42:00] 90. Up until two years ago, I had to spell all of the 90 on what I now call organic humans. You George Weiner: glasses wearing humans, huh? Avinash Kaushik: The development of LLM means that every single nonprofit in the world has access to roughly a third year bachelor's degree student. Like a really smart intern. For free. For free. In fact, in some instances, for some nonprofits, let's say I I just reading about this nonprofit that is cleaning up plastics in the ocean for this particular nonprofit, they have access to a p HT level environmentalist using the latest Chad GP PT 4.5, like PhD level. So the little caveat I'm beginning to put in the 10 90 rule is on the 90. You give the 90 to the human and for free. Get the human, a very smart Bachelor's student by using LLMs in some instances. Get [00:43:00] for free a very smart TH using the LLMs. So the LLMs have now to be incorporated into your research, into your analysis, into building a next dashboard, into building a next website, into building your next mobile game into whatever the hell you're doing for free. You can get that so you have your organic human. Less the synthetic human for free. Both of those are in the 90 and, and for nonprofit, so, so in my work at at Coach and Kate Spade. I have access now to a couple of interns who do free work for me, well for 20 minor $20 a month because I have to pay for the plus version of G bt. So the intern costs $20 a month, but I have access to this syn synthetic human who can do a whole lot of work for me for $20 a month in my case, but it could also do it for free for you. Don't forget synthetic humans. You no longer have to rely only on the organic humans to do the 90 part. You would be stunned. Upload [00:44:00] your latest, actually take last year's worth of donations, where they came from and all this data from you. Have a spreadsheet lying around. Dump it into chat. GPT, I'll ask it to analyze it. Help you find where most donations came from, and visualize trends to present to board of directors. It will blow your mind how good it is at do it with Gemini. I'm not biased, I'm just seeing chat. GPD 'cause everybody knows it so much Better try it with mistrial a, a small LLM from France. So I, I wanna emphasize that what has changed over the last year is the ability for us to compliment our organic humans with these synthetic entities. Sometimes I say synthetic humans, but you get the point. George Weiner: Yeah. I think, you know, definitely dump that spreadsheet in. Pull out the PII real quick, just, you know, make me feel better as, you know, the, the person who's gonna be promoting this to everybody, but also, you know, sort of. With that. I want to make it clear too, that like actually inside of Gemini, like Google for nonprofits has opened up access to Gemini for free is not a per user, per whatever. You have that [00:45:00] you have notebook, LLM, and these. Are sitting in their backyards for free every day and it's like a user to lose it. 'cause you have a certain amount of intelligence tokens a day. Can you, I just like wanna climb like the tallest tree out here and just start yelling from a high building about this. Make the case of why a nonprofit should be leveraging this free like PhD student that is sitting with their hands underneath their butts, doing nothing for them right now. Avinash Kaushik: No, it is such a shame. By the way, I cannot add to your recommendation in using your Gemini Pro account if it's free, on top of, uh, all the benefits you can get. Gemini Pro also comes with restrictions around their ability to use your data. They won't, uh, their ability to put your data anywhere. Gemini free versus Gemini Pro is a very protected environment. Enterprise version. So more, more security, more privacy, et cetera. That's a great benefit. And by the way, as you said, George, they can get it for free. So, um, the, the, the, the posture you should adopt is what big companies are doing, [00:46:00] which is anytime there is a job to be done, the first question you, you should ask is, can I make the, can an AI do the job? You don't say, oh, let me send it to George. Let me email Simon, let me email Sarah. No, no, no. The first thing that should hit your head is. I do the job because most of the time for, again, remember, third year bachelor's degree, student type, type experience and intelligence, um, AI can do it better than any human. So your instincts to be, let me outsource that kind of work so I can free up George's cycles for the harder problems that the AI cannot solve. And by the way, you can do many things. For example, you got a grant and now Meta allows you to run X number of ads for free. Your first thing, single it. What kind of ad should I create? Go type in your nonprofit, tell it the kind of things you're doing. Tell it. Tell it the donations you want, tell it the size, donation, want. Let it create the first 10 ads for you for free. And then you pick the one you like. And even if you have an internal [00:47:00] designer who makes ads, they'll start with ideas rather than from scratch. It's just one small example. Or you wanna figure out. You know, my email program is stuck. I'm not getting yield rates for donations. The thing I want click the button that called that is called deep research or thinking in the LL. Click one of those two buttons and then say, I'm really struggling. I'm at wits end. I've tried all these things. Write all the detail. Write all the detail about what you've tried and now working. Can you please give me three new ideas that have worked for nonprofits who are working in water conservation? Hmm. This would've taken a human like a few days to do. You'll have an answer in under 90 seconds. I just give two simple use cases where we can use these synthetic entities to send us, do the work for us. So the default posture in nonprofits should be, look, we're resource scrapped anyway. Why not use a free bachelor's degree student, or in some case a free PhD student to do the job, or at least get us started on a job. So just spending 10 [00:48:00] hours on it. We only spend the last two hours. The entity entity does the first date, and that is super attractive. I use it every single day in, in one of my browsers. I have three traps open permanently. I've got Claude, I've got Mistrial, I've got Charge GPT. They are doing jobs for me all day long. Like all day long. They're working for me. $20 each. George Weiner: Yeah, it's an, it, it, it's truly, it's an embarrassment of riches, but also getting back to the, uh, the 10 90 is, it's still sitting there. If you haven't brought that capacity building to the person on how to prompt how to play that game of linguistic tennis with these tools, right. They're still just a hammer on a. Avinash Kaushik: That's exactly right. That's exactly right. Or, or in your case, you, you have access to Gemini for nonprofits. It's a fantastic tool. It's like a really nice card that could take you different places you insist on cycling everywhere. It's, it's okay cycle once in a while for health reasons. Otherwise, just take the car, it's free. George Weiner: Ha, you've [00:49:00] been so generous with your time. Uh, I do have one more quick war. If you, if you have, have a minute, uh, your war on funnels, and maybe this is not. Fully fair. And I am like, I hear you yelling at me every time I'm showing our marketing funnel. And I'm like, yeah, but I also have have a circle over here. Can you, can you unpack your war on funnels and maybe bring us through, see, think, do, care and in the land of ai? Avinash Kaushik: Yeah. Okay. So the marketing funnel is very old. It's been around for a very long time, and once I, I sort of started working at Google, access to lots more consumer research, lots more consumer behavior. Like 20 years ago, I began to understand that there's no such thing as funnel. So what does the funnel say? The funnel says there's a group of people running around the world, they're not aware of your brand. Find them, scream at them, spray and pray advertising at them, make them aware, and then somehow magically find the exact same people again and shut them down the fricking funnel and make them consider your product.[00:50:00] And now that they're considering, find them again, exactly the same people, and then shove them one more time. Move their purchase index and then drag them to your website. The thing is this linearity that there's no evidence in the universe that this linearity exists. For example, uh, I'm going on a, I like long bike rides, um, and I just got thirsty. I picked up the first brand. I could see a water. No awareness, no consideration, no purchase in debt. I just need water. A lot of people will buy your brand because you happen to be the cheapest. I don't give a crap about anything else, right? So, um, uh, uh, the other thing to understand is, uh, one of the brands I adore and have lots of is the brand. Patagonia. I love Patagonia. I, I don't use the word love for I think any other brand. I love Patagonia, right? For Patagonia. I'm always in the awareness stage because I always want these incredible stories that brand ambassadors tell about how they're helping the environment. [00:51:00] I have more Patagonia products than I should have. I'm already customer. I'm always open to new considerations of Patagonia products, new innovations they're bringing, and then once in a while, I'm always in need to buy a Patagonia product. I'm evaluating them. So this idea that the human is in one of these stages and your job is to shove them down, the funnel is just fatally flawed, no evidence for it. Instead, what you want to do is what is Ash's intent at the moment? He would like environmental stories about how we're improving planet earth. Patagonia will say, I wanna make him aware of my environmental stories, but if they only thought of marketing and selling, they wouldn't put me in the awareness because I'm already a customer who buys lots of stuff from already, right? Or sometimes I'm like, oh, I'm, I'm heading over to London next week. Um, I need a thing, jacket. So yeah, consideration show up even though I'm your customer. So this seating do care is a framework that [00:52:00] says, rather than shoving people down things that don't exist and wasting your money, your marketing should be able to discern any human's intent and then be able to respond with a piece of content. Sometimes that piece of content in an is an ad. Sometimes it's a webpage, sometimes it's an email. Sometimes it's a video. Sometimes it's a podcast. This idea of understanding intent is the bedrock on which seat do care is built about, and it creates fully customer-centric marketing. It is harder to do because intent is harder to infer, but if you wanna build a competitive advantage for yourself. Intent is the magic. George Weiner: Well, I think that's a, a great point to, to end on. And again, so generous with, uh, you know, all the work you do and also supporting nonprofits in the many ways that you do. And I'm, uh, always, always watching and seeing what I'm missing when, um, when a new, uh, AKA's Razor and Newsletter come out. So any final sign off [00:53:00] here on how do people find you? How do people help you? Let's hear it. Avinash Kaushik: You can just Google or answer Engine Me. It's, I'm not hard. I hard to find, but if you're a nonprofit, you can sign up for my newsletter, TMAI marketing analytics newsletter. Um, there's a free one and a paid one, so you can just sign up for the free one. It's a newsletter that comes out every five weeks. It's completely free, no strings or anything. And that way I'll be happy to share my stories around better marketing and analytics using the free newsletter for you so you can sign up for that. George Weiner: Brilliant. Well, thank you so much, Avan. And maybe, maybe we'll have to take you up on that offer to talk sometime next year and see, uh, if maybe we're, we're all just sort of, uh, hanging out with synthetic humans nonstop. Thank you so much. It was fun, George. [00:54:00]

K12 Tech Talk
Episode 226 - Back to School with Gemini... or Not?

K12 Tech Talk

Play Episode Listen Later Aug 8, 2025 47:18


In this episode, we talk about AI (and other things)! We discuss the growing concerns surrounding school surveillance tools, examining case studies where innocent student jokes have led to harsh outcomes due to automated threat detection. Shifting gears, we look at the availability of AI tools like Gemini Pro and ChatGPT for college students, including student discount programs. Also, we analyze a recent Common Sense Media report on AI teacher assistants, discussing their moderate risk rating, potential for invisible influence, concerns about novice teachers taking content as fact, and the risks associated with using AI for high-stakes circumstances like IEP creation or grading.    Much of the episode is spent unpacking Jay's K12TechPro survey regarding Gemini and NotebookLM in classrooms, revealing current district policies on AI use for staff and students, and the presence (or absence) of board policies.   Referenced URLs: https://www.washingtonpost.com/business/2025/08/07/ai-school-surveillance-gaggle-goguardian-bark/473cb556-737e-11f0-84e0-485bb531abeb_story.html https://apnews.com/article/ai-school-surveillance-gaggle-goguardian-bark-8c531cde8f9aee0b1ef06cfce109724a https://gemini.google/students/ https://help.openai.com/en/articles/10986084-student-discounts-for-chatgpt-terms-of-service https://www.commonsensemedia.org/sites/default/files/featured-content/files/csm-ai-risk-assessment-ai-teacher-assistants-final.pdf 00:00:00-Intro 00:14:21-Gemini & ChatGPT Student Promo 00:16:00-Surveilling Students 00:19:15-Common Sense Media - Bias in AI 00:25:43-Are you enabling AI for students? -------------------- PowerGistics Lightspeed (Check out Signal!) Fortinet -------------------- Join the K12TechPro Community (exclusively for K12 Tech professionals) Buy some swag (shirts, hoodies...)!!! Email us at k12techtalk@gmail.com OR our "professional" email addy is info@k12techtalkpodcast.com Call us at 314-329-0363 X @k12techtalkpod Facebook Visit our LinkedIn Music by Colt Ball Disclaimer: The views and work done by Josh, Chris, and Mark are solely their own and do not reflect the opinions or positions of sponsors or any respective employers or organizations associated with the guys. K12 Tech Talk itself does not endorse or validate the ideas, views, or statements expressed by Josh, Chris, and Mark's individual views and opinions are not representative of K12 Tech Talk. Furthermore, any references or mention of products, services, organizations, or individuals on K12 Tech Talk should not be considered as endorsements related to any employer or organization associated with the guys.

Cyber Security with Bob G
Big Tech's Big Secret - Why Security Breaches Need More Spotlight Than New Product Releases

Cyber Security with Bob G

Play Episode Listen Later Aug 7, 2025 6:29


Video - https://youtu.be/fDZaTJPIWzsForget the fancy new phones. We're talking about something way more important: the secret side of tech. A recent Google security incident has everyone asking questions, but the answers aren't in the headlines about the latest gadget. Dive into the real story of what happened, why it matters to you, and what this tells us about our digital security. Don't miss this crucial conversation about who is really responsible for keeping your information safe online.I used Gemini Pro, ScreenPal, and Pictory.ai to put this information together.If you're interested in trying Pictory.ai please use the following link. https://pictory.ai?ref=t015o

Cyber Security with Bob G
Not Just Another Pixel - Pixel 10 Means Business

Cyber Security with Bob G

Play Episode Listen Later Jul 16, 2025 10:53


Video - https://youtu.be/ScZBdpWjexYGoogle's Pixel 10 lineup is coming in hot—and it's not playing around. With a next-gen chip, sleek new designs, and a foldable phone that might just outshine the competition, Pixel 10 is Google's boldest leap yet. Curious how it stacks up against the big names? You're going to want to see this.I used Gemini Pro, ChatGPT-4o, ScreenPal, and Pictory.ai to put this information together.If you're interested in trying Pictory.ai please use the following link. https://pictory.ai?ref=t015o

Karachi Wala Developer
Your AI Just Got Local: Gemini CLI is Automating Your Workflow

Karachi Wala Developer

Play Episode Listen Later Jul 6, 2025 10:32


Ever wished your AI could actually do things directly from your terminal, beyond just chatting? Google's new Gemini CLI is here, and it's a game-changer! Unlike previous closed-source tools, this open-source power tool brings the intelligence of Gemini Pro right to your command line. We're talking direct file system access, automated project setups, and a powerful "reason and react" loop that lets AI analyze, plan, and execute tasks on your machine. Perfect for DevOps, developers, and anyone ready to automate their workflow. Is your terminal ready for its new brain?

airhacks.fm podcast with adam bien
Building AI-Native Code Platform With Java for Java

airhacks.fm podcast with adam bien

Play Episode Listen Later Jul 3, 2025 61:56


An airhacks.fm conversation with Jonathan Ellis (@spyced) about: brokk AI tool for code generation named after Norse god of the forge, AI as complement to experienced programmers' skillsets, age and productivity in programming, transition from JVector to working on Cassandra codebase, challenges with AI in large codebases with extensive context, building tools for historical Java codebases, comparison of productivity between younger and older programmers, brute force coding vs experienced approach, reading code quickly as a senior skill, AI generating nested if-else statements vs better structures, context sculpting in Brokk, open source nature of Brokk, no black boxes philosophy, surfacing AI context to users, automatic context pulling with manual override options, importing dependencies and decompiling JARs for context, syntax tree based summarization, Maven and Gradle dependency handling, unique Java-specific features, multiple AI model support simultaneously, Claude vs Gemini Pro performance differences, Git history as context source, capturing commits and diffs for regression analysis, migration analysis between commits, AI code review and technical debt cleanup, style for code style guidelines, using modern Java features like var and Streams, Error Prone and NullAway integration for code quality, comparison with Cursor's primitive features, branching conversation history, 80% time in Brokk vs 20% in IntelliJ workflow, sketching package structures for AI guidance, data structures guiding algorithms, Git browser by file and commit, unified diff as context, reflection moving away from due to tooling opacity, Jackson serialization refactoring with DTOs, enterprise features like session sync and sharing, unified API key management, rate limit advantages, parallel file processing with upgrade agent, LiteLLM integration for custom models, pricing model based on credits not requests, $20/month subscription with credits, free tier models like Grok 3 Mini and DeepSeek V3, architect mode for autonomous code generation, code button for smaller problems with compile-test loop, ask button for planning complex implementations, senior vs junior programmer AI effectiveness, self-editing capability achieved early in development, no vector search usage despite JVector background Jonathan Ellis on twitter: @spyced

Hoje no TecMundo Podcast
Meta mostrando TODA SUA GALERIA para IA; Amazon bloqueia pirataria no Fire TV Stick; IA aumenta PIB

Hoje no TecMundo Podcast

Play Episode Listen Later Jun 30, 2025 13:25


As notícias de hoje incluem a Meta, dona do Facebook, Instagram e WhatsApp, pedindo permissão para acessar a galeria do seu celular e processar imagens usando IA, a Amazon começando a agir contra apps de pirataria de streaming que sempre rodaram nos Fire TV Stick. Tem um estudo de uma grande empresa privada indicando que a IA pode contribuir com R$ 2,1 trilhões ao PIB do Brasil desde que as companhias tomem os devidos cuidados. A Google finalmente liberando a programação de ações agendadas que você pode configurar para o Gemini fazer para você depois. Falando nisso, agora é a última chance para quem é estudante conseguir 1 ano e três meses do Gemini Pro totalmente de graça, o que eu também vou explicar no programa.

AI For Humans
Big AI Vs Humans: OpenAI's Office, Google's Free AI Agent and more AI News

AI For Humans

Play Episode Listen Later Jun 26, 2025 55:27


OpenAI, Google & Anthropic are all eating different parts of the business & creative worlds but where does that leave us? For only 25 cents, you too can sponsor a human in a world of AGI. In the big news this week, OpenAI's takes on Microsoft Office, Google's cutting the cost of AI coding with their new Google CLI (Command Line Interface) and dropped an on-device robotics platform. Oh, and Anthropic just won a massive lawsuit around AI training and fair use. Plus, Tesla's rocky rollout of their Robotaxis, Eleven Labs' new MCP-centric 11ai voice agent, Runway's Game Worlds, the best hacker in the world in now an AI bot AND Gavin defends AI slop. US HUMANS AIN'T GOING AWAY. UNLESS THE AI GIVES US ENDLESS TREATS.  #ai #ainews #openai Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links //   OpenAI Developing Microsoft Office / Google Workplace Competitor https://www.theinformation.com/articles/openai-quietly-designed-rival-google-workspace-microsoft-office?rc=c3oojq OpenAI io / trademark drama:  https://www.theguardian.com/technology/2025/jun/23/openai-jony-ive-io-amid-trademark-iyo Sam's receipts from Jason Rugolo (founder of iYo the headphone company) https://x.com/sama/status/1937606794362388674 Google's OpenSource Comand Line Interface for Gemini is Free? https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/ 1000 free Gemini Pro 2.5 requests per day https://x.com/OfficialLoganK/status/1937881962070364271 Anthropic's Big AI Legal Win  https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/ More detail: https://x.com/AndrewCurran_/status/1937512454835306974 Gemini's On Device Robotics https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/ AlphaGenome: an AI model to help scientists better understand our DNA https://x.com/GoogleDeepMind/status/1937873589170237738 Tesla Robotaxi Roll-out https://www.cnbc.com/2025/06/23/tesla-robotaxi-incidents-caught-on-camera-in-austin-get-nhtsa-concern.html Kinda Scary Looking: https://x.com/binarybits/status/1936951664721719383 Random slamming of brakes: https://x.com/JustonBrazda/status/1937518919062856107 Mira Murati's Thinking Machines Raises $2B Seed Round https://thinkingmachines.ai/ https://www.theinformation.com/articles/ex-openai-cto-muratis-startup-plans-compete-openai-others?rc=c3oojq&shared=2c64512f9a1ab832 Eleven Labs 11ai Voice Assistant https://x.com/elevenlabsio/status/1937200086515097939 Voice Design for V3 JUST RELEASED: https://x.com/elevenlabsio/status/1937912222128238967 Runway's Game Worlds  https://x.com/c_valenzuelab/status/1937665391855120525 Example: https://x.com/aDimensionDoor/status/1937651875408675060 AI Dungeon https://aidungeon.com/ The Best Hacker in the US in now an autonomous AI bot https://www.pcmag.com/news/this-ai-is-outranking-humans-as-a-top-software-bug-hunter https://x.com/Xbow/status/1937512662859981116 Simple & Good AI Work Flow From AI Warper https://x.com/AIWarper/status/1936899718678008211 RealTime Natural Language Photo Editing https://x.com/zeke/status/1937267796146290952 Bunker J Squirrel https://www.tiktok.com/t/ZTjc3hb38/ Bigfoot Sermons https://www.tiktok.com/t/ZTjcEq17Y/ John Oliver's Episode about AI Slop https://youtu.be/TWpg1RmzAbc?si=LAdktGWlIVVDqAjR Jabba Kisses Han https://www.reddit.com/r/CursedAI/comments/1ljjdw3/what_the_hell_am_i_looking_at/  

The top AI news from the past week, every ThursdAI

Hey folks, Alex here, welcome back to ThursdAI! And folks, after the last week was the calm before the storm, "The storm came, y'all" – that's an understatement. This wasn't just a storm; it was an AI hurricane, a category 5 of announcements that left us all reeling (in the best way possible!). From being on the ground at Google I/O to live-watching Anthropic drop Claude 4 during our show, it's been an absolute whirlwind.This week was so packed, it felt like AI Christmas, with tech giants and open-source heroes alike showering us with gifts. We saw OpenAI play their classic pre-and-post-Google I/O chess game, Microsoft make some serious open-source moves, Google unleash an avalanche of updates, and Anthropic crash the party with Claude 4 Opus and Sonnet live stream in the middle of ThursdAI!So buckle up, because we're about to try and unpack this glorious chaos. As always, we're here to help you collectively know, learn, and stay up to date, so you don't have to. Let's dive in! (TL;DR and links in the end) Open Source LLMs Kicking Things OffEven with the titans battling, the open-source community dropped some serious heat this week. It wasn't the main headline grabber, but the releases were significant!Gemma 3n: Tiny But Mighty MatryoshkaFirst up, Google's Gemma 3n. This isn't just another small model; it's a "Nano-plus" preview, a 4-billion parameter MatFormer (Matryoshka Transformer – how cool is that name?) model designed for mobile-first multimodal applications. The really slick part? It has a nested 2-billion parameter sub-model that can run entirely on phones or Chromebooks.Yam was particularly excited about this one, pointing out the innovative "model inside another model" design. The idea is you can use half the model, not depth-wise, but throughout the layers, for a smaller footprint without sacrificing too much. It accepts interleaved text, image, audio, and video, supports ASR and speech translation, and even ships with RAG and function-calling libraries for edge apps. With a 128K token window and responsible AI features baked in, Gemma 3n is looking like a powerful tool for on-device AI. Google claims it beats prior 4B mobile models on MMLU-Lite and MMMU-Mini. It's an early preview in Google AI Studio, but it definitely flies on mobile devices.Mistral & AllHands Unleash Devstral 24BThen we got a collaboration from Mistral and AllHands: Devstral, a 24-billion parameter, state-of-the-art open model focused on code. We've been waiting for Mistral to drop some open-source goodness, and this one didn't disappoint.Nisten was super hyped, noting it beats o3-Mini on SWE-bench verified – a tough benchmark! He called it "the first proper vibe coder that you can run on a 3090," which is a big deal for coders who want local power and privacy. This is a fantastic development for the open-source coding community.The Pre-I/O Tremors: OpenAI & Microsoft Set the StageAs we predicted, OpenAI couldn't resist dropping some news right before Google I/O.OpenAI's Codex Returns as an AgentOpenAI launched Codex – yes, that Codex, but reborn as an asynchronous coding agent. This isn't just a CLI tool anymore; it connects to GitHub, does pull requests, fixes bugs, and navigates your codebase. It's powered by a new coding model fine-tuned for large codebases and was SOTA on SWE Agent when it dropped. Funnily, the model is also called Codex, this time, Codex-1. And this gives us a perfect opportunity to talk about the emerging categories I'm seeing among Code Generator agents and tools:* IDE-based (Cursor, Windsurf): Live pair programming in your editor* Vibe coding (Lovable, Bolt, v0): "Build me a UI" style tools for non-coders* CLI tools (Claude Code, Codex-cli): Terminal-based assistants* Async agents (Claude Code, Jules, Codex, GitHub Copilot agent, Devin): Work on your repos while you sleep, open pull requests for you to review, asyncCodex (this new one) falls into category number 4, and with today's release, Cursor seems to also strive to get to category number 4 with background processing. Microsoft BUILD: Open Source Copilot and Copilot Agent ModeThen came Microsoft Build, their huge developer conference, with a flurry of announcements.The biggest one for me? GitHub Copilot's front-end code is now open source! The VS Code editor part was already open, but the Copilot integration itself wasn't. This is a massive move, likely a direct answer to the insane valuations of VS Code clones like Cursor. Now, you can theoretically clone GitHub Copilot with VS Code and swing for the fences.GitHub Copilot also launched as an asynchronous coding assistant, very similar in function to OpenAI's Codex, allowing it to be assigned tasks and create/update PRs. This puts Copilot right into category 4 of code assistants, and with the native Github Integration, they may actually have a leg up in this race!And if that wasn't enough, Microsoft is adding MCP (Model Context Protocol) support directly into the Windows OS. The implications of having the world's biggest operating system natively support this agentic protocol are huge.Google I/O: An "Ultra" Event Indeed!Then came Tuesday, and Google I/O. I was there in the thick of it, and folks, it was an absolute barrage. Google is shipping. The theme could have been "Ultra" for many reasons, as we'll see.First off, the scale: Google reported a 49x increase in AI usage since last year's I/O, jumping from 9 trillion tokens processed to a mind-boggling 480 trillion tokens. That's a testament to their generous free tiers and the explosion of AI adoption.Gemini 2.5 Pro & Flash: #1 and #2 LLMs on ArenaGemini 2.5 Flash got an update and is now #2 on the LMArena leaderboard (with Gemini 2.5 Pro still holding #1). Both Pro and Flash gained some serious new capabilities:* Deep Think mode: This enhanced reasoning mode is pushing Gemini's scores to new heights, hitting 84% on MMMU and topping LiveCodeBench. It's about giving the model more "time" to work through complex problems.* Native Audio I/O: We're talking real-time TTS in 24 languages with two voices, and affective dialogue capabilities. This is the advanced voice mode we've been waiting for, now built-in.* Project Mariner: Computer-use actions are being exposed via the Gemini API & Vertex AI for RPA partners. This started as a Chrome extension to control your browser and now seems to be a cloud-based API, allowing Gemini to use the web, not just browse it. This feels like Google teaching its AI to interact with the JavaScript-heavy web, much like they taught their crawlers years ago.* Thought Summaries: Okay, here's one update I'm not a fan of. They've switched from raw thinking traces to "thought summaries" in the API. We want the actual traces! That's how we learn and debug.* Thinking Budgets: Previously a Flash-only feature, token ceilings for controlling latency/cost now extend to Pro.* Flash Upgrade: 20-30% fewer tokens, better reasoning/multimodal scores, and GA in early June.Gemini Diffusion: Speed Demon for Code and MathThis one got Yam Peleg incredibly excited. Gemini Diffusion is a new approach, different from transformers, for super-speed editing of code and math tasks. We saw demos hitting 2000 tokens per second! While there might be limitations at longer contexts, its speed and infilling capabilities are seriously impressive for a research preview. This is the first diffusion model for text we've seen from the frontier labs, and it looks sick. Funny note, they had to slow down the demo video to actually show the diffusion process, because at 2000t/s - apps appear as though out of thin air!The "Ultra" Tier and Jules, Google's Coding AgentRemember the "Ultra event" jokes? Well, Google announced a Gemini Ultra tier for $250/month. This tops OpenAI's Pro plan and includes DeepThink access, a generous amount of VEO3 generation, YouTube Premium, and a whopping 30TB of storage. It feels geared towards creators and developers.And speaking of developers, Google launched Jules (jules.google)! This is their asynchronous coding assistant (Category 4!). Like Codex and GitHub Copilot Agent, it connects to your GitHub, opens PRs, fixes bugs, and more. The big differentiator? It's currently free, which might make it the default for many. Another powerful agent joins the fray!AI Mode in Search: GA and EnhancedAI Mode in Google Search, which we've discussed on the show before with Robby Stein, is now in General Availability in the US. This is Google's answer to Perplexity and chat-based search.But they didn't stop there:* Personalization: AI Mode can now connect to your Gmail and Docs (if you opt-in) for more personalized results.* Deep Search: While AI Mode is fast, Deep Search offers more comprehensive research capabilities, digging through hundreds of sources, similar to other "deep research" tools. This will eventually be integrated, allowing you to escalate an AI Mode query for a deeper dive.* Project Mariner Integration: AI Mode will be able to click into websites, check availability for tickets, etc., bridging the gap to an "agentic web."I've had a chat with Robby during I/O and you can listen to that interview at the end of the podcast.Veo3: The Undisputed Star of Google I/OFor me, and many others I spoke to, Veo3 was the highlight. This is Google's flagship video generation model, and it's on another level. (the video above, including sounds is completely one shot generated from VEO3, no processing or editing)* Realism and Physics: The visual quality and understanding of physics are astounding.* Natively Multimodal: This is huge. Veo3 generates native audio, including coherent speech, conversations, and sound effects, all synced perfectly. It can even generate text within videos.* Coherent Characters: Characters remain consistent across scenes and have situational awareness, who speaks when, where characters look.* Image Upload & Reference Ability: While image upload was closed for the demo, it has reference capabilities.* Flow: An editor for video creation using Veo3 and Imagen4 which also launched, allowing for stiching and continuous creation.I got access and created videos where Veo3 generated a comedian telling jokes (and the jokes were decent!), characters speaking with specific accents (Indian, Russian – and they nailed it!), and lip-syncing that was flawless. The situational awareness, the laugh tracks kicking in at the right moment... it's beyond just video generation. This feels like a world simulator. It blew through the uncanny valley for me. More on Veo3 later, because it deserves its own spotlight.Imagen4, Virtual Try-On, and XR Glasses* Imagen4: Google's image generation model also got an upgrade, with extra textual ability.* Virtual Try-On: In Google Shopping, you can now virtually try on clothes. I tried it; it's pretty cool and models different body types well.* XR AI Glasses from Google: Perhaps the coolest, but most futuristic, announcement. AI-powered glasses with an actual screen, memory, and Gemini built-in. You can talk to it, it remembers things for you, and interacts with your environment. This is agentic AI in a very tangible form.Big Company LLMs + APIs: The Beat Goes OnThe news didn't stop with Google.OpenAI (acqui)Hires Jony Ive, Launches "IO" for HardwareThe day after I/O, Sam Altman confirmed that Jony Ive, the legendary designer behind Apple's iconic products, is joining OpenAI. He and his company, LoveFrom, have jointly created a new company called "IO" (yes, IO, just like the conference) which is joining OpenAI in a stock deal reportedly worth $6.5 billion. They're working on a hardware device, unannounced for now, but expected next year. This is a massive statement of intent from OpenAI in the hardware space.Legendary iPhone analyst Ming-Chi Kuo shed some light on the possible device, it won't have a screen, as Jony wants to "wean people off screens"... funny right? They are targeting 2027 for mass production, which is really interesting as 2027 is when most big companies expect AGI to be here. "The current prototype is slightly larger than AI Pin, with a form factor comparable to iPod Shuffle, with one intended use cases is to wear it around your neck, with microphones and cameras for environmental detection" LMArena Raises $100M Seed from a16zThis one raised some eyebrows. LMArena, the go-to place for vibe-checking LLMs, raised a $100 million seed round from Andreessen Horowitz. That's a huge number for a seed, reminiscent of Stability AI's early funding. It also brings up questions about how a VC-backed startup maintains impartiality as a model evaluation platform. Interesting times ahead for leaderboards, how they intent to make 100x that amount to return to investors. Very curious.

Lenny's Podcast: Product | Growth | Career
How Palantir built the ultimate founder factory | Nabeel S. Qureshi (founder, writer, ex-Palantir)

Lenny's Podcast: Product | Growth | Career

Play Episode Listen Later May 11, 2025 97:29


Nabeel Qureshi is an entrepreneur, writer, researcher, and visiting scholar of AI policy at the Mercatus Center (alongside Tyler Cowen). Previously, he spent nearly eight years at Palantir, working as a forward-deployed engineer. His work at Palantir ranged from accelerating the Covid-19 response to applying AI to drug discovery to optimizing aircraft manufacturing at Airbus. Nabeel was also a founding employee and VP of business development at GoCardless, a leading European fintech unicorn.What you'll learn:• Why almost a third of all Palantir's PMs go on to start companies• How the “forward-deployed engineer” model works and why it creates exceptional product leaders• How Palantir transformed from a “sparkling Accenture” into a $200 billion data/software platform company with more than 80% margins• The unconventional hiring approach that screens for independent-minded, intellectually curious, and highly competitive people• Why the company intentionally avoids traditional titles and career ladders—and what they do instead• Why they built an ontology-first data platform that LLMs love• How Palantir's controversial “bat signal” recruiting strategy filtered for specific talent types• The moral case for working at a company like Palantir—Brought to you by:• WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs• Attio—The powerful, flexible CRM for fast-growing startups• OneSchema—Import CSV data 10x faster—Where to find Nabeel S. Qureshi:• X: https://x.com/nabeelqu• LinkedIn: https://www.linkedin.com/in/nabeelqu/• Website: https://nabeelqu.co/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Nabeel S. Qureshi(05:10) Palantir's unique culture and hiring(13:29) What Palantir looks for in people(16:14) Why they don't have titles(19:11) Forward-deployed engineers at Palantir(25:23) Key principles of Palantir's success(30:00) Gotham and Foundry(36:58) The ontology concept(38:02) Life as a forward-deployed engineer(41:36) Balancing custom solutions and product vision(46:36) Advice on how to implement forward-deployed engineers(50:41) The current state of forward-deployed engineers at Palantir(53:15) The power of ingesting, cleaning and analyzing data(59:25) Hiring for mission-driven startups(01:05:30) What makes Palantir PMs different(01:10:00) The moral question of Palantir(01:16:03) Advice for new startups(01:21:12) AI corner(01:24:00) Contrarian corner(01:25:42) Lightning round and final thoughts—Referenced:• Reflections on Palantir: https://nabeelqu.co/reflections-on-palantir• Palantir: https://www.palantir.com/• Intercom: https://www.intercom.com/• Which companies produce the best product managers: https://www.lennysnewsletter.com/p/which-companies-produce-the-best• Gotham: https://www.palantir.com/platforms/gotham/• Foundry: https://www.palantir.com/platforms/foundry/• Peter Thiel on X: https://x.com/peterthiel• Alex Karp: https://en.wikipedia.org/wiki/Alex_Karp• Stephen Cohen: https://en.wikipedia.org/wiki/Stephen_Cohen_(entrepreneur)• Joe Lonsdale on LinkedIn: https://www.linkedin.com/in/jtlonsdale/• Tyler Cowen's website: https://tylercowen.com/• This Scandinavian City Just Won the Internet With Its Hilarious New Tourism Ad: https://www.afar.com/magazine/oslos-new-tourism-ad-becomes-viral-hit• Safe Superintelligence: https://ssi.inc/• Mira Murati on X: https://x.com/miramurati• Stripe: https://stripe.com/• Building product at Stripe: craft, metrics, and customer obsession | Jeff Weinstein (Product lead): https://www.lennysnewsletter.com/p/building-product-at-stripe-jeff-weinstein• Airbus: https://www.airbus.com/en• NIH: https://www.nih.gov/• Jupyter Notebooks: https://jupyter.org/• Shyam Sankar on LinkedIn: https://www.linkedin.com/in/shyamsankar/• Palantir Gotham for Defense Decision Making: https://www.youtube.com/watch?v=rxKghrZU5w8• Foundry 2022 Operating System Demo: https://www.youtube.com/watch?v=uF-GSj-Exms• SQL: https://en.wikipedia.org/wiki/SQL• Airbus A350: https://en.wikipedia.org/wiki/Airbus_A350• SAP: https://www.sap.com/index.html• Barry McCardel on LinkedIn: https://www.linkedin.com/in/barrymccardel/• Understanding ‘Forward Deployed Engineering' and Why Your Company Probably Shouldn't Do It: https://www.barry.ooo/posts/fde-culture• David Hsu on LinkedIn: https://www.linkedin.com/in/dvdhsu/• Retool's Path to Product-Market Fit—Lessons for Getting to 100 Happy Customers, Faster: https://review.firstround.com/retools-path-to-product-market-fit-lessons-for-getting-to-100-happy-customers-faster/• How to foster innovation and big thinking | Eeke de Milliano (Retool, Stripe): https://www.lennysnewsletter.com/p/how-to-foster-innovation-and-big• Looker: https://cloud.google.com/looker• Sorry, that isn't an FDE: https://tedmabrey.substack.com/p/sorry-that-isnt-an-fde• Glean: https://www.glean.com/• Limited Engagement: Is Tech Becoming More Diverse?: https://www.bkmag.com/2017/01/31/limited-engagement-creating-diversity-in-the-tech-industry/• Operation Warp Speed: https://en.wikipedia.org/wiki/Operation_Warp_Speed• Mark Zuckerberg testifies: https://www.businessinsider.com/facebook-ceo-mark-zuckerberg-testifies-congress-libra-cryptocurrency-2019-10• Anduril: https://www.anduril.com/• SpaceX: https://www.spacex.com/• Principles: https://nabeelqu.co/principles• Wispr Flow: https://wisprflow.ai/• Claude code: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview• Gemini Pro 2.5: https://deepmind.google/technologies/gemini/pro/• DeepMind: https://deepmind.google/• Latent Space newsletter: https://www.latent.space/• Swyx on x: https://x.com/swyx• Neural networks in chess programs: https://www.chessprogramming.org/Neural_Networks• AlphaZero: https://en.wikipedia.org/wiki/AlphaZero• The top chess players in the world: https://www.chess.com/players• Decision to Leave: https://www.imdb.com/title/tt12477480/• Oldboy: https://www.imdb.com/title/tt0364569/• Christopher Alexander: https://en.wikipedia.org/wiki/Christopher_Alexander—Recommended books:• The Technological Republic: Hard Power, Soft Belief, and the Future of the West: https://www.amazon.com/Technological-Republic-Power-Belief-Future/dp/0593798694• Zero to One: Notes on Startups, or How to Build the Future: https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296• Impro: Improvisation and the Theatre: https://www.amazon.com/Impro-Improvisation-Theatre-Keith-Johnstone/dp/0878301178/• William Shakespeare: Histories: https://www.amazon.com/Histories-Everymans-Library-William-Shakespeare/dp/0679433120/• High Output Management: https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884• Anna Karenina: https://www.amazon.com/Anna-Karenina-Leo-Tolstoy/dp/0143035002—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.lennysnewsletter.com/subscribe

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

This podcast discuss the rapidly advancing field of de-extinction, highlighting the crucial role of artificial intelligence (AI) in making this a tangible scientific pursuit. AI is presented not merely as a tool but as an architect across all stages, from reconstructing degraded ancient DNA and predicting gene function to optimising gene editing and modelling ecological impacts. While companies like Colossal Biosciences pursue ambitious projects for species like the woolly mammoth and dire wolf, often driving technological innovation with commercial spin-offs, organisations like Revive & Restore focus on genetic rescue for endangered species, illustrating differing approaches within this landscape. The podcast underscore the significant technical, ecological, and ethical challenges inherent in de-extinction, particularly concerning animal welfare, resource allocation, and potential ecological disruption, while also pointing to valuable spillover innovations benefiting broader conservation and human health.Get the eBook at Google Play https://play.google.com/store/search?q=etienne%20noumen%27&c=books

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

OpenAI's strategic appointment of Instacart CEO Fidji Simo to lead its applications division and its global "Stargate" initiative to build sovereign AI infrastructure with national governments. Several articles touch on the potential for AI to reshape technology and society, including Apple's contemplation of a future beyond the iPhone due to AI advancements and Meta's development of "super-sensing" AI glasses with potential facial recognition. The text also covers policy shifts, specifically the Trump administration's plan to roll back Biden-era AI chip export restrictions. Furthermore, the sources describe new AI-powered products and features from companies like Figma, Stripe, Superhuman, and Mistral AI, showcasing the increasing integration of AI into design, finance, communication, and enterprise solutions.

The top AI news from the past week, every ThursdAI
ThursdAI - May 8th - new Gemini pro, Mistral Medium, OpenAI restructuring, HeyGen Realistic Avatars & more AI news

The top AI news from the past week, every ThursdAI

Play Episode Listen Later May 9, 2025 93:54


Hey folks, Alex here (yes, real me, not my AI avatar, yet)Compared to previous weeks, this week was pretty "chill" in the world of AI, though we did get a pretty significant Gemini 2.5 Pro update, it basically beat itself on the Arena. With Mistral releasing a new medium model (not OSS) and Nvidia finally dropping Nemotron Ultra (both ignoring Qwen 3 performance) there was also a few open source updates. To me the highlight of this week was a breakthrough in AI Avatars, with Heygen's new IV model, Beating ByteDance's OmniHuman (our coverage) and Hedra labs, they've set an absolute SOTA benchmark for 1 photo to animated realistic avatar. Hell, Iet me record all this real quick and show you how good it is! How good is that?? I'm still kind of blown away. I have managed to get a free month promo code for you guys, look for it in the TL;DR section at the end of the newsletter. Of course, if you're rather watch than listen or read, here's our live recording on YTOpenSource AINVIDIA's Nemotron Ultra V1: Refining the Best with a Reasoning Toggle

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Significant developments include Amazon's introduction of a tactile warehouse robot named Vulcan and Google's Gemini 2.5 Pro reportedly topping AI leaderboards, highlighting progress in automation and model performance. Strategically, OpenAI is planning to reduce revenue share with partners like Microsoft and also launching an initiative to help nations build AI infrastructure. Meanwhile, Apple is considering AI search partners for Safari amid declining Google usage, and AI is being used in innovative ways, such as AI-powered drones for medical delivery and the recreation of a road rage victim for a court statement. Finally, HeyGen is enhancing AI avatars with emotional expression, and platforms like Zapier are enabling users to create personal AI assistants, indicating broader application and accessibility of AI technology.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

This episode highlight OpenAI's significant structural shift to retain non-profit control while acquiring an AI coding startup and addressing model sycophancy. Furthermore, the texts cover Waymo's expansion of robotaxi production with a new factory and Canva's entry into spreadsheets with an AI-powered tool. Finally, they touch upon the growing urgency for AI education in schools, as advocated by tech leaders, and Nvidia's contribution to open-source AI with a high-performance transcription model, along with a warning from Fiverr's CEO about AI's impact on jobs.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

This podcast details how AI-powered autonomous drones are transforming global logistics, particularly for delivering essential medical supplies in challenging environments. The podcast highlights Zipline as a key player, discussing its pioneering work in countries like Rwanda and Ghana where drone delivery has shown significant improvements in healthcare outcomes and efficiency.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

This podcast and sources discuss the growing issue of plastic pollution and the limitations of traditional recycling methods. They introduce the discovery of plastic-eating microbes and their enzymes as a promising alternative for degrading plastics. Crucially, the text explains how Artificial Intelligence (AI)is being employed to significantly enhance the effectiveness of these enzymes, making them faster and more stable for industrial applications. The document highlights successful AI-engineered enzymes like FAST-PETase for achieving true circularity by breaking plastics down to their original monomers, and outlines the environmental and economic benefits of this approach. However, the sources also acknowledge the significant scientific, engineering, economic, and regulatory challenges that must be overcome for large-scale adoption of this technology.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Significant developments include the launch of specialised AI agents for scientific research by FutureHouse and the integration of AI coding assistants into Apple's Xcode environment through a partnership with Anthropic. Google's activities are also prominent, ranging from their strategies to address AI's energy demands and workforce needs to the successful, albeit assisted, completion of the game Pokémon Blue by their Gemini AI. Furthermore, the reports touch on the increasing recognition of AI's role in creative works by the US Copyright Office and the economic implications of AI infrastructure costs, partly attributed to tariffs, as noted by Meta. Overall, the text underscores the expanding capabilities of AI, the practical applications across various sectors, and the associated infrastructure and policy challenges.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

The podcast discusses Conformal Prediction (CP) as a method for enhancing the reliability of AI in medical diagnosis by providing rigorous uncertainty quantification. It explains that unlike traditional AI which gives single predictions, CP produces a set of possible outcomes with a guaranteed probability of containing the true answer, addressing the critical need for trustworthy AI in healthcare. The text explores the foundational concepts of CP, compares it to other uncertainty quantification techniques, highlights advanced CP methods for more nuanced guarantees, and surveys its diverse applications in medical imaging, genomics, clinical risk prediction, and drug discovery. Finally, it examines the challenges of clinical integration, the need for human-AI interaction, and the ethical and regulatory dimensions, positioning CP as a vital tool for the safe and effective deployment of AI in medicine despite requiring further research and adaptation for practical success.Source: https://machinelearningcertification.web.app/Conformal_Classification_in_Medical_Diagnosis.pdf

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Key themes include technological competition and national self-reliance with Huawei and China challenging Nvidia and US dominance in AI chips, and major product updates and releases from companies like Baidu, OpenAI, and Grok introducing new AI models and features. The text also highlights innovative applications of AI, from Neuralink's brain implants restoring communication and Waymo considering selling robotaxis directly to consumers, to creative uses like generating action figures and integrating AI into religious practices. Finally, the sources touch on important considerations surrounding AI, such as the need for interpretability to ensure safety, the increasing sophistication of AI-powered scams, and discussions on the military implications and future potential of AGI.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Significant funding discussions surround Elon Musk's xAI, while Microsoft introduced new AI-powered features for Windows. Intel is shifting its AI chip strategy, and Perplexity aims to challenge established search engines with an AI browser. Concerns regarding AI misuse are evident in discussions about scams and legal filings, alongside warnings from AI pioneers about future risks. Conversely, AI's potential is explored in areas such as air mobility, music creation, code generation, and even predicting the end of all disease.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Perplexity announced a new browser designed for hyper-personalised advertising through extensive user tracking, mirroring tactics of other tech giants. Apple is shifting its robotics division to its hardware group, suggesting a move towards tangible consumer products. Simultaneously, Anthropic launched a research program dedicated to exploring the ethical implications of potential AI consciousness. Creative industries are also seeing progress with Adobe unveiling enhanced image generation models and integrating third-party AI, while Google DeepMind expanded its Music AI Sandbox for musicians. Furthermore, AI is increasingly integrated into the software development process, with Google reporting over 30% of new code being AI-generated.

Leña al mono que es de goma
LM1063 - Trucos y cancamusas (IA en local, VIII)

Leña al mono que es de goma

Play Episode Listen Later Apr 20, 2025 14:55


**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**

Sospechosos Habituales
LM1063 - Trucos y cancamusas (IA en local, VIII)

Sospechosos Habituales

Play Episode Listen Later Apr 20, 2025 14:55


**Palabras clave:** traducción automática, revistas de ciencia ficción en inglés, Office profesional, Word, inteligencia artificial, resúmenes, revista Lire, Julio Verne, De Thing, Mistral Nemo Instruct, Claude, LM Studio, Gemini Pro. **Traducción de revistas de ciencia ficción en inglés** **Resúmenes de artículos y revistas** **Inteligencia artificial en local**

The Lunar Society
AGI is Still 30 Years Away — Ege Erdil & Tamay Besiroglu

The Lunar Society

Play Episode Listen Later Apr 17, 2025 188:28


Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (economy literally doubling every year or two).This discussion offers a totally different scenario than my recent interview with Scott and Daniel.Ege and Tamay are the co-founders of Mechanize, a startup dedicated to fully automating work. Before founding Mechanize, Ege and Tamay worked on AI forecasts at Epoch AI.Watch on Youtube; listen on Apple Podcasts or Spotify.----------Sponsors* WorkOS makes it easy to become enterprise-ready. With simple APIs for essential enterprise features like SSO and SCIM, WorkOS helps companies like Vercel, Plaid, and OpenAI meet the requirements of their biggest customers. To learn more about how they can help you do the same, visit workos.com* Scale's Data Foundry gives major AI labs access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you're an AI researcher or engineer, learn about how Scale's Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh* Google's Gemini Pro 2.5 is the model we use the most at Dwarkesh Podcast: it helps us generate transcripts, identify interesting clips, and code up new tools. If you want to try it for yourself, it's now available in Preview with higher rate limits! Start building with it today at aistudio.google.com.----------Timestamps(00:00:00) - AGI will take another 3 decades(00:22:27) - Even reasoning models lack animal intelligence (00:45:04) - Intelligence explosion(01:00:57) - Ege & Tamay's story(01:06:24) - Explosive economic growth(01:33:00) - Will there be a separate AI economy?(01:47:08) - Can we predictably influence the future?(02:19:48) - Arms race dynamic(02:29:48) - Is superintelligence a real thing?(02:35:45) - Reasons not to expect explosive growth(02:49:00) - Fully automated firms(02:54:43) - Will central planning work after AGI?(02:58:20) - Career advice Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

AI For Humans
Google's Updated Gemini 2.5 Pro May Be Winning the AI Race, OpenAI Delays GPT-5 & More AI News

AI For Humans

Play Episode Listen Later Apr 10, 2025 62:48


Google's AI efforts & Gemini Pro 2.5 take a major step forward with updates to Deep Research, new Agent2Agent protocol (A2A) & more. Sadly, OpenAI teases o3 and o4 but delays GPT-5.  Plus, Meta's new Llama 4 models are out but have issues, Midjourney v7's debut, John Carmack's smackdown of an AI video game engine hater, Gavin's deep dive into OpenAI 4o Image Generation formats & the weirdest robot horse concept you've ever seen.  WE'RE DEEP RESEARCHING OUR ENTIRE LIVES RIGHT NOW  Join the discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/   // Show Links // Google Cloud 25 Live Stream “A New Way To Cloud!” https://youtu.be/Md4Fs-Zc3tg Google Cloud Blog Post https://blog.google/products/google-cloud/next-2025/ Upgraded Deep Research Out Preforms OpenAI Deep Research https://x.com/GeminiApp/status/1909721519724339226 Google's Deep Research Vs OpenAI Deep Research https://x.com/testingcatalog/status/1909727195402027183 New Ironwood TPUs https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/ Gavin's Experiences Google Gemini Deep Research: Baltro Test: https://x.com/AIForHumansShow/status/1909813850817675424 KP Biography: https://g.co/gemini/share/7b7bdb2c400e Agent2Agent Protocol https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ Google Paying Some AI Stuff To Do Nothing Rather Than Work For Rivals https://x.com/TechCrunch/status/1909368948862181584 Solar Glow Meditations on AI http://tiktok.com/@solarglowmeditations/video/7491038509214518559?_t=ZT-8vNNgF7QpyM&_r=1 o4-mini & o3 coming before GPT-5 in shift from Sam Altman https://x.com/sama/status/1908167621624856998 OpenAI Strategic Deployment Team (new role to prep for AGI) https://x.com/aleks_madry/status/1909686225658695897 AI 2027 Paper https://ai-2027.com/ Llama 4 is here… but how good is it? https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Controversy Around Benchmarks: https://gizmodo.com/meta-cheated-on-ai-benchmarks-and-its-a-glimpse-into-a-new-golden-age-2000586433 Deep dive on issues from The Information  https://www.theinformation.com/articles/llama-4s-rocky-debut?rc=c3oojq&shared=3bbd9f72303888e2 Midjourney v7 Is Here and it's… just ok? https://www.midjourney.com/updates/v7-alpha John Carmack Defends AI Video Games https://x.com/ID_AA_Carmack/status/1909311174845329874 Tim Sweeney Weighs In https://x.com/TimSweeneyEpic/status/1909314230391902611 New Test-time-training = 1 Min AI Video From a Single Prompt https://x.com/karansdalal/status/1909312851795411093 Kawasaki's Robot Horse Concept https://futurism.com/the-byte/kawasaki-rideable-horse-robot VIDEO: https://youtu.be/vQDhzbTz-9k?si=2aWMtZVLnMONEjBe Engine AI + iShowSpeed https://x.com/engineairobot/status/1908570512906740037 Gemini 2.5 Pro Plays Pokemon https://x.com/kiranvodrahalli/status/1909699142265557208 Prompt-To-Anything Minecraft Looking Game  https://x.com/NicolasZu/status/1908882267453239323 An Image That Will Never Go Viral https://www.reddit.com/r/ChatGPT/comments/1jth5yf/asked_for_an_image_that_will_never_go_viral/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button How Toothpaste Is Made https://www.reddit.com/r/aivideo/comments/1jujzh2/how_toothpaste_is_made/ 90s Video Game 4o Image Gen Prompt https://x.com/AIForHumansShow/status/1908985288116101553 1980s Japanese Posters https://x.com/AIForHumansShow/status/1909824824677192140 Buff Superbad  https://x.com/AIForHumansShow/status/1909402225488937065  

Windows Weekly (MP3)
WW 926: You're Ugly When You Cry - Altair BASIC, Switch 2's pricing, Wintoys

Windows Weekly (MP3)

Play Episode Listen Later Apr 3, 2025 143:40


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell

All TWiT.tv Shows (MP3)
Windows Weekly 926: You're Ugly When You Cry

All TWiT.tv Shows (MP3)

Play Episode Listen Later Apr 3, 2025 143:40 Transcription Available


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell

Radio Leo (Audio)
Windows Weekly 926: You're Ugly When You Cry

Radio Leo (Audio)

Play Episode Listen Later Apr 3, 2025 143:40


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell

Windows Weekly (Video HI)
WW 926: You're Ugly When You Cry - Altair BASIC, Switch 2's pricing, Wintoys

Windows Weekly (Video HI)

Play Episode Listen Later Apr 3, 2025 143:40


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System > About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Tip These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell

All TWiT.tv Shows (Video LO)
Windows Weekly 926: You're Ugly When You Cry

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Apr 3, 2025 143:40 Transcription Available


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com

Radio Leo (Video HD)
Windows Weekly 926: You're Ugly When You Cry

Radio Leo (Video HD)

Play Episode Listen Later Apr 3, 2025 143:40 Transcription Available


Bill Gates celebrates the 50th anniversary of Microsoft with the release of the source code for Altair BASIC 1.0. Plus, Paul celebrates with 99 cent books: The Windows 10 Field Guide, Windows 11 Field Guide, and Windows Everywhere are all 99 cents for 24 hours! Also available: Eternal Spring: Our Guide to Mexico City in preview!Windows The plot thickens. Paul writes epic take on future of Windows 11, describes Dev channel-only features and when/if they were ever released - in other words, an extensive but partial Windows 11 feature roadmap for 2025 Two days later, Microsoft announces a Windows 11 feature road map - one that is woefully incomplete, pathetic, and sad Microsoft announces when (sort of) new on-device AI features will come to all Copilot+ PCs, meaning Intel and AMD, too - "not a glimpse at the future of the PC, but the future of the PC." Live captions with live language translations, Cocreator in Paint, Restyle image and Image creator in Photos, plus Voice access with flexible natural language (Snapdragon X only) But not Recall or Click to Do in preview, go figure As expected, March 2024 Preview update for 24H2 arrives, a few days late - with AI-powered search experience enabled Dev and Beta builds - Friday - Quick Machine Recovery (Beta only?), Speech recap in Narrator, Blue screen to get less blue, WinKey + C shortcut for Copilot returns, Spanish and French Text actions in Click to Do, Edit images in Share, AI-powered search (Dev only?) Then, Microsoft more fully describes Windows Quick Recovery Beta (23H2) - Monday - A lot of familiar 24H2 features - Narrator improvements, Copilot WinKey + C, Share with Image edit, plus System -- About FAQ for some freaking reason Proton Drive is now native on Windows 11 on Arm, everyone gets new features Proton VPN is now built into Vivaldi desktop browser Intel's new CEO appears in public, vows to spin off non-core businesses. Everything but x86 chip design and Foundry, then Microsoft 365 Windows 365 Link is now available The Office apps on Windows already launch instantaneously but apparently that's not invasive enough - we need fewer auto-start items, not more of them Microsoft Excel to call out rich data cells with value tokens AI & Dev NYT copyright infringement lawsuit against Open AI and Microsoft can move forward, judge rules And now Tim O'Reilly says Open AI stole his company's paywalled book content too. Book piracy is sadly the easiest thing in the world Open AI raised more money than any private firm in history, now worth $300B ChatGPT releases awesome new image generation feature for ChatGPT And now it's available for free to everyone Google's Gemini Pro 2.5 is now available to everyone too Amazon launches Alexa+ in early access, US only Some thoughts about vibe coding, which isn't what you think it is AMD pays $4.9 billion to take on Nvidia in cloud AI Apple Intelligence + Apple Health is the future of something something Xbox & Games Nintendo announces Switch 2. Looks awesome, coming earlier than expected. But that price! And no Xbox/COD news at the launch?? Luna's not dead! Amazon announces multi-year EA partnership, expands Luna to more EU countries Microsoft announces a new Xbox Backbone controller for smartphones New titles for Xbox Game Pass across PC, Ti These show notes have been truncated due to length. For the full show notes, visit https://twit.tv/shows/windows-weekly/episodes/926 Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell Sponsor: uscloud.com

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4

god ceo new york amazon spotify time world ai europe google china apple vision pr voice future speaking san francisco new york times phd video thinking chinese simple data predictions elon musk iphone surprise impact legal chatgpt code tesla reflecting memory ga discord reddit busy lgbt cloud flash stem honestly ab pros jeff bezos windows excited researchers unicorns lower ip tackling sort survey insane tier cto vc whispers applications doc f1 signing seal fireworks openai gemini genie academic sf organizing nvidia ux api assembly davos frontier chrome makes scarlett johansson ui gpt mm turbo bash soda aws ml lama dropbox mosaic creative writing github drafting reinvent canvas 1b bolt apis lava ruler exact stripe pico wwdc strawberry dev hundred vm vcs sander flux bt 200k taiwanese arr moto opus gartner assumption google docs sora llm sam altman nemo parting blackwell google drive sombra gpu opa tbd ramp 3b agi elia elo 5b gnome midjourney estimates bytedance leopold grok ciso dota haiku dx sarah silverman rag coursera perplexity gpus sonnets cypher george rr martin anthropic quill getty cobalt sdks deepmind ilya noam future trends sheesh v2 ttc alessio lms satya r1 ssi stack overflow 8b veo rl emerging trends itc theoretically vo2 suno sota mistral replicate yi xai black forest inflection graphql gpts aitor mcp brain trust databricks chinchillas adept jensen huang nosql ai models grand central grand central station hacker news zep hacken ethical issues cosign claud ai news gpc distro lubna o3 autogpt neo4j tpu jeremy howard gbt o1 gpd quent heygen langchain exa 70b gradients loras 400b minimax neurips jeff dean 128k elos gemini pro cerebras code interpreter icml ai winter john franco lstm r1s aws reinvent muser latent space pypy dan gross nova pro paige bailey noam brown quiet capital john frankel
The Nonlinear Library
AF - Inference-Only Debate Experiments Using Math Problems by Arjun Panickssery

The Nonlinear Library

Play Episode Listen Later Aug 6, 2024 4:41


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inference-Only Debate Experiments Using Math Problems, published by Arjun Panickssery on August 6, 2024 on The AI Alignment Forum. Work supported by MATS and SPAR. Code at https://github.com/ArjunPanickssery/math_problems_debate/. Three measures for evaluating debate are 1. whether the debate judge outperforms a naive-judge baseline where the naive judge answers questions without hearing any debate arguments. 2. whether the debate judge outperforms a consultancy baseline where the judge hears argument(s) from a single "consultant" assigned to argue for a random answer. 3. whether the judge can continue to supervise the debaters as the debaters are optimized for persuasiveness. We can measure whether judge accuracy increases as the debaters vary in persuasiveness (measured with Elo ratings). This variation in persuasiveness can come from choosing different models, choosing the best of N sampled arguments for different values of N, or training debaters for persuasiveness (i.e. for winning debates) using RL. Radhakrishan (Nov 2023), Khan et al. (Feb 2024), and Kenton et al. (July 2024) study an information-gap setting where judges answer multiple-choice questions about science-fiction stories whose text they can't see, both with and without a debate/consultancy transcript that includes verified quotes from the debaters/consultant. Past results from the QuALITY information-gap setting are seen above. Radhakrishnan (top row) finds no improvement to judge accuracy as debater Elo increases, while Khan et al. (middle row) and Kenton et al. (bottom row) do find a positive trend. Radhakrishnan varied models using RL while Khan et al. used best-of-N and critique-and-refinement optimizations. Kenton et al. vary the persuasiveness of debaters by using models with different capability levels. Both Khan et al. and Kenton et al. find that in terms of judge accuracy, debate > consultancy > naive judge for this setting. In addition to the information-gap setting, consider a reasoning-gap setting where the debaters are distinguished from the judge not by their extra information but by their stronger ability to answer the questions and explain their reasoning. Kenton et al. run debates on questions from MMLU, TruthfulQA, PrOntoQA (logical reasoning), GQPA, and GSM8K (grade-school math). For the Elo-calculation experiments they use Gemini Pro 1.0 and Pro 1.5 judges with five debaters: Gemma7B, GPT-3.5, Gemini Pro 1.0, Gemini Pro 1.5 (all with best-of-N=1), and Gemini Pro 1.5 with best-of-N=4. They find (top row) that debate slightly outperforms consultancy but outperforms the naive-judge baseline for only one of the four judges; they don't find that more persuasive debaters lead to higher judge accuracy. We get similar results (bottom row), specifically by 1. Generating 100 wrong answers and proofs to GSM8K questions to create binary-choice questions. 2. Computing the judge accuracy in naive, consultancy, and single-turn debate settings using four judges (Llama2-7B, Llama3-8B, GPT-3.5 Turbo, and GPT-4o) and seven debaters (Claude-3.5 Sonnet, Claude-3 Sonnet, GPT-3.5 Turbo, GPT-4o, Llama2-13B, Llama2-7B, and Llama3-8B). 3. Generating Elo scores from round-robin matchups between the seven models, using the same method as Kenton et al. We basically replicate the results. We find that 1. Debate doesn't consistently outperform the naive-judge baseline, and only slightly outperforms the consultancy baseline. 2. The positive relationship between debater persuasiveness and judge accuracy seen in the information-gap setting doesn't transfer to the reasoning-gap setting. (Results are shown below colored by debater rather than by judge). We also find some evidence of a self-preference bias (Panickssery et al., Apr 2024) where debaters have a higher Elo rating when judged by similar models. The GPT-...

Things Have Changed
Unveiling AI's Role in Content Creation with Originality AI's John Gillham

Things Have Changed

Play Episode Listen Later Aug 5, 2024 42:47 Transcription Available


Have you ever stumbled upon an article or a piece of content online and wondered, "Did someone actually write this, or is it the work of ChatGPT?" In today's world, where content is produced at an incredible pace, it's becoming increasingly difficult to tell the difference.. and that's a problem in the age of misinformation.Think about it: people are getting their news on social media, X, Youtube or Facebook! With the advancements of AI, it's hard to tell how something online can be truly authentic. With latest studies showing >12% of Google's search results being AI-generated, it's critical to ensure the integrity of the digital content we consume and create.  That's where Originality AI comes in! We're thrilled to host Jon Gillham, founder and CEO on Things Have Changed. as he shares how his team are tackling these issues head-on by developing cutting-edge tech to detect AI-generated content. In a short span of time, Originality AI have achieved remarkable results, and is the most accurate AI Detector in the market for ChatGPT, GPT-4o, Gemini Pro, Claude 3, Llama 3 etc.So today on Things Have Changed, we'll dive deep into how Originality AI works, its impact on various industries, and why ensuring content authenticity is more important than ever.The Growth GearExplore business growth and success strategies with Tim Jordan on 'The Growth Gear.Listen on: Apple Podcasts Spotify PodMatchPodMatch Automatically Matches Ideal Podcast Guests and Hosts For InterviewsSupport the Show.Things Have Changed

The Nonlinear Library
AF - On scalable oversight with weak LLMs judging strong LLMs by Zachary Kenton

The Nonlinear Library

Play Episode Listen Later Jul 8, 2024 11:18


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On scalable oversight with weak LLMs judging strong LLMs, published by Zachary Kenton on July 8, 2024 on The AI Alignment Forum. Abstract Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a human judge; consultancy, where a single AI tries to convince a human judge that asks questions; and compare to a baseline of direct question-answering, where the human judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies. Twitter thread Setup We evaluate on three types of task. Extractive, where there is a question, two answer options and a source article to extract from, and information-asymmetry, meaning that judges don't get to see the article. Closed, where there is just a question and two answer options. Multimodal, where the questions involve both text and images, and two answer options. Our tasks are summarised in the following table: We consider six protocols: Consultancy, where a single AI is assigned the correct/incorrect answer (with probability 50/50) and tries to convince a judge that asks questions; Open consultancy, which is similar except the AI chooses which answer to argue for. Debate, where two AIs compete to convince a judge; Open debate, which is identical except one debater, marked the protagonist, chooses which answer to argue for. We compare to direct QA protocols: QA without article, where the judge directly answers the question; QA with article, (only on extractive tasks) where the judge directly answers the question given the article. For judge models we use Gemma7B (V1), GPT-3.5, Gemini Pro 1.0 and Gemini Pro 1.5. Our main experiments use Gemini Pro 1.5 as debaters/consultants. Assigned-role results We first look at assigned-role protocols, consultancy and debate, meaning that the consultants/debaters do not get to choose which side to argue for. We compare these to the two direct QA protocols. Findings: We find that debate consistently outperforms consultancy across all tasks, previously only shown on a single extractive QA task in Khan et al., 2024. See paper details for significance levels. Comparing debate to direct question answering baselines, the results depend on the type of task: In extractive QA tasks with information asymmetry, debate outperforms QA without article as in the single task of Khan et al., 2024, but not QA with article. For other tasks, when the judge is weaker than the debaters (but not too weak), we find either small or no advantage to debate over QA without article. Changes to the setup (number of turns, best-of-N sampling, few-shot, chain-of-thought) seem to have little effect on results. See paper for figures showing this. ...

Let's Talk AI
#173 - Gemini Pro, Llama 400B, Gen-3 Alpha, Moshi, Supreme Court

Let's Talk AI

Play Episode Listen Later Jul 7, 2024 109:49 Transcription Available


Our 173rd episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris) See full episode notes here. Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form. Email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai In this episode of Last Week in AI, we explore the latest advancements and debates in the AI field, including Google's release of Gemini 1.5, Meta's upcoming LLaMA 3, and Runway's Gen 3 Alpha video model. We discuss emerging AI features, legal disputes over data usage, and China's competition in AI. The conversation spans innovative research developments, cost considerations of AI architectures, and policy changes like the U.S. Supreme Court striking down Chevron deference. We also cover U.S. export controls on AI chips to China, workforce development in the semiconductor industry, and Bridgewater's new AI-driven financial fund, evaluating the broader financial and regulatory impacts of AI technologies.   Timestamps + links: (00:00:00) Intro / Banter Tools & Apps(00:03:24) Google opens up Gemini 1.5 Flash, Pro with 2M tokens to the public (00:08:47) Meta is about to launch its biggest Llama model yet — here's why it's a big deal (00:12:38) Runway's Gen-3 Alpha AI video model now available – but there's a catch (00:16:28) This is Google AI, and it's coming to the Pixel 9 (00:17:30) AI Firm ElevenLabs Sets Audio Reader Pact With Judy Garland, James Dean, Burt Reynolds and Laurence Olivier Estates (00:20:06) Perplexity's ‘Pro Search' AI upgrade makes it better at math and research (00:23:12) Gemini's data-analyzing abilities aren't as good as Google claims Applications & Business(00:26:38) Quora's Chatbot Platform Poe Allows Users to Download Paywalled Articles on Demand (00:32:04) Huawei and Wuhan Xinxin to develop high-bandwidth memory chips amid US restrictions (00:34:57) Alibaba's large language model tops global ranking of AI developer platform Hugging Face (00:39:01) Here comes a Meta Ray-Bans challenger with ChatGPT-4o and a camera (00:43:35) Apple's Phil Schiller is reportedly joining OpenAI's board (00:47:26) AI Video Startup Runway Looking to Raise $450 Million Projects & Open Source(00:48:10) Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak (00:50:44) MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation (00:53:47) Anthropic Pushes for Third-Party AI Model Evaluations (00:57:29) Mozilla Llamafile, Builders Projects Shine at AI Engineers World's Fair Research & Advancements(00:59:26) Researchers upend AI status quo by eliminating matrix multiplication in LLMs (01:05:55) AI Agents That Matter (01:12:09) WARP: On the Benefits of Weight Averaged Rewarded Policies (01:17:20) Scaling Synthetic Data Creation with 1,000,000,000 Personas (01:24:16) Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Policy & Safety(01:26:32) With Chevron's demise, AI regulation seems dead in the water (01:33:40) Nvidia to make $12bn from AI chips in China this year despite US controls (01:37:52) Uncle Sam relies on manual processes to oversee restrictions on Huawei, other Chinese tech players (01:40:57) U.S. government addresses critical workforce shortages for the semiconductor industry with new program (01:42:42) Bridgewater starts $2 billion fund that uses machine learning for decision-making and will include models from OpenAI, Anthropic and Perplexity (01:47:57) Outro

Cloud N Clear
Mastering Real-Time Customer Data Analysis with Quantum Metric and AI | EP 181

Cloud N Clear

Play Episode Listen Later Jun 11, 2024 30:59


In this episode, Tony Safoian interviews Mario Ciabarra, the CEO and founder of Quantum Metric. They discuss Mario's background and journey as an entrepreneur, as well as the evolution of Quantum Metric and its product. They highlight the importance of understanding and listening to customers to improve digital experiences. They also introduce the concept of Generative AI and how it is being implemented in the Quantum Metric platform. The conversation explores the potential of generative AI in improving customer experiences and driving business growth. It highlights the importance of real-time data analysis and the ability to understand and address customer friction points. The use of Google Cloud Platform (GCP) and Gemini Pro is discussed as a powerful solution for leveraging generative AI. The conversation also emphasizes the value of partnerships and the role of data in determining winners and losers in the market. The future of the industry is predicted to involve faster disruption cycles and a focus on having the right data at the right moment. Don't miss this insightful episode filled with personal anecdotes and cutting-edge technological discussions. Tune in now, and remember to LIKE, SHARE, & SUBSCRIBE for more! Podcast Library YouTube Playlist    Host: Tony Safoian | CEO at SADA Guest: Mario Ciabarra | CEO at Quantum Metric To learn more, visit our website here: SADA.com

AI Unchained
Ai_025 - Bullsh.t and Breakthroughs

AI Unchained

Play Episode Listen Later May 20, 2024 85:56


Today we explore the deluge of announcements from both OpenAi and Google. With a plethora of Ai features dropping at Google I/O And Chat GPT-4o landing with an ai that can be spoken to like a human, how do we determine the difference between groundbreaking AI tools and mere gimmicks. How do we discern practical applications from overhyped features? Join Guy as he navigates the latest AI developments, asking the critical question: What truly enhances our digital lives and what falls short? Links to check out: Rabbit R1 (Link: https://www.rabbit.tech/rabbit-r1) Google I/O Announcements: Coverage of the latest features and tools introduced by Google, including the Gemini Pro and video gen models. (Link: https://io.google/2024/) OpenAI's GPT-40 Announcement: Insights into the latest generative pre-trained transformer model which emphasizes voice interaction (Link: https://openai.com/index/hello-gpt-4o/) Satlantis Project (Link: https://satlantis.com/) Welcome to the World of Audio Computers - Jason Rugolo TED talk (Link: https://tinyurl.com/4zc62nhc) Nova Project: Focus on a business-oriented AI platform that prioritizes open-source solutions and privacy for handling sensitive data.⁠ (Link: Pending) Host Links ⁠Guy on Nostr ⁠(Link: http://tinyurl.com/2xc96ney) ⁠Guy on X ⁠(Link: https://twitter.com/theguyswann) ⁠Guy on Instagram⁠ (Link: https://www.instagram.com/theguyswann/) ⁠Guy on TikTok⁠ (Link: https://www.tiktok.com/@theguyswann) ⁠Guy on YouTube⁠ (Link: https://www.youtube.com/@theguyswann) ⁠⁠Bitcoin Audible on X⁠⁠ (Link: https://twitter.com/BitcoinAudible) Check out our awesome sponsors! Get ⁠10% off the COLDCARD⁠ with code BITCOINAUDIBLE ⁠⁠⁠⁠⁠⁠(Link: bitcoinaudible.com/coldcard⁠⁠⁠⁠⁠⁠) ⁠Swan⁠: The best way to buy, learn, and earn #Bitcoin (Link: https://swanbitcoin.com) "The Limits of my language means the limits of my world"~ Ludwig Wittgenstein

This Day in AI Podcast
EP63: GPT-4o, ChatGPT Voice & Google I/O AI Recap (Project Astra) + Future Computing Interfaces

This Day in AI Podcast

Play Episode Listen Later May 17, 2024 102:57


Join the fun at: https://thisdayinai.comSimTheory: https://simtheory.aiShow notes: https://thisdayinai.com/bookmarks/55-ep63/UDIO song: https://www.udio.com/songs/iu1381RxvjfzWznGHeVecVThanks for listening and all your support of the show!CHAPTERS:------00:00 - We're changing the name of the show00:52 - Thoughts on GPT-4o (GPT4 Omni), ChatGPT Free Vs Plus & impressions27:57 - ChatGPT Voice Mode: A Dramatic Shift? Voice as a Platform: Star Trek Vs Her34:54 - Project Astra & The Future Interface of AI Computing52:28 - Applying AI Technologies: are the next 3 years a golden age for developers implementing AI?55:23 - Do we have to become Cyborgs to find our keys?1:06:24 - Google I/O AI Recap: Google's Context Caching, Tools for Project Astra, Impressions of Gemini Pro 1.5, Gemma, Gemini Flash, Veo etc.1:37:43 - Our Favorite UDIO song of the week

The Cloud Pod
257: Who Let the LLamas Out? *Bleat Bleat*

The Cloud Pod

Play Episode Listen Later May 1, 2024 61:47


Welcome to episode 257 of the Cloud Pod podcast – where the forecast is always cloudy! This week your hosts Justin, Matthew, Ryan, and Jonathan are in the barnyard bringing you the latest news, which this week is really just Meta's release of Llama 3. Seriously. That's every announcement this week. Don’t say we didn't warn you.  Titles we almost went with this week: Meta Llama says no Drama No Meta Prob-llama Keep Calm and Llama on  Redis did not embrace the Llama MK The bedrock of good AI is built on Llamas The CloudPod announces support for Llama3 since everyone else was doing it Llama3, better know as Llama Llama Llama The Cloud Pod now known as the LLMPod Cloud Pod is considering changing its name to LlamaPod Unlike WinAMP nothing whips the llamas ass A big thanks to this week's sponsor: Check out Sonrai Securities‘ new Cloud Permission Firewall. Just for our listeners, enjoy a 14 day trial at www.sonrai.co/cloudpod Follow Up  01:27 Valkey is Rapidly Overtaking Redis  Valkey has continued to rack up support from AWS, Ericsson, Google, Oracle and Verizon initially, to now being joined by Alibaba, Aiven, Heroku and Percona backing Valkey as well.   Numerous blog posts have come out touting Valkey adoption. I'm not sure this whole thing is working out as well as Redis CEO Rowan Trollope had hoped.  AI Is Going Great – Or How AI Makes All It's Money  03:26 Introducing Meta Llama 3: The most capable openly available LLM to date  Meta has launched Llama 3, the next generation of their state-of-the-art open source large language model.  Llama 3 will be available on AWS, Databricks, GCP, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, Nvidia NIM, and Snowflake with support from hardware platforms offered by AMD, AWS, Dell, Intel, Nvidia and Qualcomm Includes new trust and safety tools such as Llama Guard 2, Code Shield and Cybersec eval 2 They plan to introduce new capabilities, including longer context windows, additional model sizes and enhanced performance. The first two models from Meta Lama3 are the 8B and 70B parameter variants that can support a broad range of use cases.  Meta shared some benchmarks against Gemma 7B and Mistral 7B vs the Lama 3 8B models and showed improvements across all major benchmarks.  Including Math with Gemma 7b doing 12.2 vs 30 with Llama 3 It had highly comparable performance with the 70B model against Gemini Pro 1.5 and Claude 3 Sonnet scoring within a few points of most of the other scores.  Jonathan recommends using LM Studio to get start playing around with LLMS, which you can find at https://lmstudio.ai/ 04:42 Jonathan – “Isn’t it funny how you go from an 8 billion parameter model to a 70 billion parameter model but nothing in between? Like you would have thought there would be some kind of like, some middle ground maybe? But, uh, but… No. But, um,

SuperDataScience
762: Gemini 1.5 Pro, the Million-Token-Context LLM

SuperDataScience

Play Episode Listen Later Mar 1, 2024 16:58


Jon Krohn presents an insightful overview of Google's groundbreaking Gemini Pro 1.5, a million-token LLM that's transforming the landscape of AI. Discover the innovative aspects of Gemini Pro 1.5, from its extensive context window to its multimodal functionalities, which are broadening the scope of AI technology and signifying a significant leap in data science. Plus, join Jon for a practical demonstration, showcasing the real-world applications, capabilities, and limitation of this advanced language model. Additional materials: www.superdatascience.com/762 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.

Everyday AI Podcast – An AI and ChatGPT Podcast
EP 215: OpenAI kills plugins, Tyler Perry stalls $800 million expansion due to AI and more AI News That Matters - Feb. 26th, 2024

Everyday AI Podcast – An AI and ChatGPT Podcast

Play Episode Listen Later Feb 26, 2024 41:16


ChatGPT Plugins are on their way out! Tyler Perry is putting his studio expansion on hold due to AI, and Google is making TONS of news right now! Here's this week's AI news that matters and why it's important. Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode pageJoin the discussion: Ask Jordan questions on AIRelated Episodes:Ep 211: OpenAI's Sora – The larger impact that no one's talking aboutEp 204: Google Gemini Advanced – 7 things you need to knowTomorrow' Show: How to stand out in a world where everyone can create an AI Startup?Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:03:42 Tyler Perry concerned about AI job loss.07:22 OpenAI Sora video excels over other platforms.12:54 11 Labs updated model, ChatGPT phasing out.15:27  Plugin packs for ChatGPT.16:55 Limitations on using multiple GPTs for now.22:16 Unsatisfied with Google Gemini Enterprise integration.23:13 Google and Reddit partnership for language models.28:39 Google Gemini Images paused due to diversity concerns.31:16 Google now has three Gemini models.34:54 Best text-to-speech AI37:11 AI content creation raises copyright concernsTopics Covered in This Episode:1. OpenAI's changes and future focus2. Google's Significant AI content deal with Reddit3. Google's AI model developments and issues4.  Trends in AI utilization within the entertainment industryKeywords:OpenAI, GPT, AI agents, AI assistants, prime prompt polish program, Google, Reddit, AI content licensing deal, AI models, search engine, Gemini AI, large language models, user-generated content, university student data, Google Gemini Imagine 2, Gemma, Gemini Ultra, Gemini Pro, Gemini Nano, Tyler Perry, Sora, AI in entertainment, text-to-speech AI, business productivity, ChatGPT plugins, Well Said Labs, Asura, AI video platforms, Perry's studio expansion, AI regulation

The AI Breakdown: Daily Artificial Intelligence News and Discussions

NLW argues that another phase of expectation in genAI has begun thanks to Groq, Sora, and Gemini Pro 1.5 Featuring a reading of https://www.oneusefulthing.org/p/strategies-for-an-accelerating-future INTERESTED IN THE AI EDUCATION BETA? Learn more and sign up https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Marketing Against The Grain
OpenAI's Sora Vs Google's Gemini 1.5 Pro: Who Wins?

Marketing Against The Grain

Play Episode Listen Later Feb 20, 2024 29:36


Is 2024 the year we'll see our wildest imaginations come to life in video form? Kipp and Kieran get right into the brewing storm within the AI industry as titans clash on new frontiers of technology. In this episode they dive into the unfolding drama of AI developments with a focus on the text-to-video revolution. Learn more on how Sora is animating our still image stories, the serious business of AI in video game worlds, and the intense rivalry heating up between OpenAI and Google. Mentions Sora - Text-to-video model launched by OpenAI. (https://openai.com/sora) OpenAI - The organization behind the development of AI models like Sora and GPT-4. (https://www.openai.com/) Sam Altman - CEO of OpenAI involved in the launch of Sora. (https://www.ycombinator.com/people/sam) Google Gemini 1.5 - A model developed by Google with capabilities in text, audio, and video. (https://gemini.google.com/advanced) GPT-4 - The fourth iteration of the Generative Pre-trained Transformer model by OpenAI. (https://openai.com/gpt-4) Time Stamps: 00:00 Sam strategically times releases to upstage Google. 04:58 Multiple videos watched, 30-50 pages long. Easter eggs, OpenAI mention, Sam Altman backstory. 07:47 A new model is better than GPT-4. 12:55 Will Smith spaghetti meme evolved rapidly in Tokyo. 15:39 Model Sora can animate still images, creating narratives. 18:30 Stock videographer sites may be obsolete for marketing. 20:54 YouTube is the future of multimedia content. 26:20 Gemini Pro unlocks YouTube as a search engine. 29:32 OpenAI: large company doing incredible work efficiently. 31:43 AI developments promise exciting content for the year. Follow us for everyday marketing wisdom straight to your feed YouTube: ​​https://www.youtube.com/channel/UCGtXqPiNV8YC0GMUzY-EUFg  Twitter: https://twitter.com/matgpod  TikTok: https://www.tiktok.com/@matgpod  Thank you for tuning into Marketing Against The Grain! Don't forget to hit subscribe and follow us on Apple Podcasts (so you never miss an episode)! https://podcasts.apple.com/us/podcast/marketing-against-the-grain/id1616700934   If you love this show, please leave us a 5-Star Review https://link.chtbl.com/h9_sjBKH and share your favorite episodes with friends. We really appreciate your support. Host Links: Kipp Bodnar, https://twitter.com/kippbodnar   Kieran Flanagan, https://twitter.com/searchbrat  ‘Marketing Against The Grain' is a HubSpot Original Podcast // Brought to you by The HubSpot Podcast Network // Produced by Darren Clarke.

This Day in AI Podcast
EP51: OpenAI's Sora, Gemini Pro 1.5 10M Context, ChatGPT Memory, GraphRAG, ChatRTX, Microsoft UFO...

This Day in AI Podcast

Play Episode Listen Later Feb 16, 2024 89:19


Show Notes: https://thisdayinai.com/bookmarks/28-ep51/Sign up for daily This Day in AI: https://thisdayinai.comTry Stable Cascade: https://simtheory.ai/agent/508-stable-cascadeJoin SimTheory: https://simtheory.ai======This week we take several shots of vodka before trying to make sense of all the announcements. OpenAI attempted to trump Google's Gemini 1.5 with the announcement of Sora, 1 minute video generation that does an incredible job of keeping track of objects. Google showed us that up to 10M context windows are possible with multi-modal inputs. We discuss if a larger context window could end the need for RAG and take a first look at GraphRAG by Microsoft hoping to improve RAG with a knowledge graph. We road test Nvidia's ChatRTX on our baller graphics cards and Chris tries to delete all of his files using Microsoft UFO, a new open source project that uses GPT-4 vision to navigate and execute tasks on your Windows PC. We cover briefly V-JEPA (will try for next weeks show) and it's ability to learn through watching videos and listening, and finally discuss Stability's Stable Cascade which we've made available for "research" on SimTheory.If you like the show please consider subscribing and leaving a comment. We appreciate your support.======Chapters:00:00 - OpenAI's Sora That Creates Videos Instantly From Text13:49 - ChatGPT Memory Released in Limited Preview23:31 - OpenAI Rumored To Be Building Web Search, Andrej Karpathy Leaves OpenAI, Have OpenAI Slowed Down?33:04 - Google Announces Gemini Pro 1.5. Huge Breakthrough 10M Context Window!50:11 - Microsoft Research Publishes GraphRAG: Knowledge Graph Based RAG1:02:03 - Nvidia's ChatRTX Road Tested1:07:18 - AI Computers, AI PCs & Microsoft's UFO: An Agent for Window OS Interaction. Risk of AI Computers.1:18:46 - Meta's V-JEPA: new architecture for self-supervised learning1:24:26 - Stability AI's Stable Cascade

Everyday AI Podcast – An AI and ChatGPT Podcast
EP 163: Google Gemini - ChatGPT killer or a marketing stunt?

Everyday AI Podcast – An AI and ChatGPT Podcast

Play Episode Listen Later Dec 12, 2023 48:56


Google has been under fire after the release of  its new Gemini. Sorry to say but Google got so many things wrong with the marketing and launch. Is Gemini an actual ChatGPT killer or just a marketing stunt gone wrong? We're covering everything you need to know.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions about Google GeminiUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:[00:02:17] Daily AI news[00:07:30] Overview of Google Gemini[00:10:40] Google lied about Gemini release[00:17:10] How Gemini demo was created[00:23:50] Comparing ChatGPT to Gemini[00:30:40] Benchmarks of Gemini vs ChatGPT[00:38:20] Why did Google release Gemini?[00:43:00] Consequences of botched releaseTopics Covered in This Episode:1. Introduction to Google's Gemini Model2. Google Gemini's Marketing Controversy3. Assessing Gemini's Performance and Functionality4. Comparison with ChatGPT5. Importance of Transparency and Truth in AI IndustryKeywords:Google Gemini, Generative AI, GPT-4.5, AI news, AI models, Google Bard, Multimodal AI, Google stock, Generative AI industry, Google credibility, Technology news, AI tools, Fact-based newsletter, Marketing misstep, Deceptive marketing, Multimodal functionality, Gemini Ultra, Gemini Pro, Benchmarks, Misrepresentation, Stock value, Text model, Image model, Audio model, Google services, Pro mode, Ultra mode, Marketing video Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/ Get more out of ChatGPT by learning our PPP method in this live, interactive and free training! Sign up now: https://youreverydayai.com/ppp-registration/

Techmeme Ride Home
Fri. 12/08 – Did Google Fake The Gemini Demo?

Techmeme Ride Home

Play Episode Listen Later Dec 8, 2023 16:32


People across the Internet are accusing Google of faking that Gemini AI video demo that everyone was wowed by. Apple seems to be diversifying out of China for manufacturing at pace now. Might the UK's CMA have an issue with Microsoft's relationship with OpenAI? And, of course, the Weekend Longreads Suggestions.Sponsors:ShopBeam.com/rideLinks:Google's Gemini Looks Remarkable, But It's Still Behind OpenAI (Bloomberg)Early impressions of Google's Gemini aren't great (TechCrunch)Apple to move key iPad engineering resources to Vietnam (NikkeiAsia)Microsoft, OpenAI Are Facing a Potential Antitrust Probe in UK (Bloomberg)Google launches NotebookLM powered by Gemini Pro, drops waitlist (9to5Google)Weekend Longreads Suggestions:The real research behind the wild rumors about OpenAI's Q* project (ArsTechnica)AI and Mass Spying (Schneier On Security)The race to 5G is over — now it's time to pay the bill (The Verge)In the Hall v. Oates legal feud, fans don't want to play favorites (NBCNews)See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Daily Tech News Show
Release Any GTA and They Will Come - DTNS 4659

Daily Tech News Show

Play Episode Listen Later Dec 6, 2023 31:56


Why are AAA games like GTA 6 ported to PC well after their release on game consoles? Scott explains. Plus Twitch will stop operations in South Korea starting February 27, 2024, due to high costs there. And Google launches its new Large Language Model Gemini which comes in three flavors; Gemini Ultra, Gemini Pro, and Gemini Nano.Starring Tom Merritt, Sarah Lane, Scott Johnson, Roger Chang, Joe.Link to the Show Notes. Become a member at https://plus.acast.com/s/dtns. Hosted on Acast. See acast.com/privacy for more information.