Podcast appearances and mentions of Jeff Dean

  • 103PODCASTS
  • 222EPISODES
  • 47mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Jun 3, 2026LATEST
Jeff Dean

POPULARITY

20192020202120222023202420252026


Best podcasts about Jeff Dean

Latest podcast episodes about Jeff Dean

Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Play Episode Listen Later Jun 3, 2026 180:03


Ali Behrouz, grad student at Cornell and Google researcher, discusses his potentially transformative work on new architectures for continual learning in AI. His paper "Nested Learning," praised by Jeff Dean as a possible paradigm shift, enables models to adapt to new context while preserving core knowledge by updating different layers at different frequencies, inspired by human memory systems. The conversation also covers his latest work on AI "sleep" for memory consolidation, why he sees all deep learning as associative memory, and the profound implications of continual learning for privacy, alignment, and the path to AGI. Mercury: The fintech trusted by ambitious companies and individuals to run their finances, with virtual cards, spending limits, merchant/category locks, and AI-friendly tools like API keys, MCP, and CLI. Check out Mercury at mercury.com Sponsor: Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

I'm excited to work with Microsoft once again as the presenting sponsors of the AI Engineer World's Fair! We'll streaming live from MS Build today for a special crossover pod with our friends at No Priors and the one and only Satya Nadella. However we did not hold back with this interview - we asked all the burning questions about uptime and Copilot that we know you have in your minds. Lets go!For almost two decades, GitHub has been the home of software, where both open source and closed flow, through commits, pull requests, reviews, actions, etc.This ecosystem flourished as open-source maintainers and contributors would continue shipping code for the benefit of the community. However as coding agents began to ship mass quantities of code - growing 1400% in 2026, it marked a new era that was both extremely exciting and challenging for GitHub.While these agents help more people ship more projects, they also significantly increase the floor of how much code is shipped, how often it is shipped, how many people commit code, and basically orders of magnitude multiples in every dimension of GitHub infrastructure:Now GitHub inevitably experiences more pressure on their infrastructure which was originally designed around human developers moving at human speed. This has resulted in a very publicly notable uptime story:So it begs the question of whether current systems around code can absorb what AI produces. Can CI/CD keep up when every idea becomes a build? Can open source maintainers survive floods of AI-generated slop contributions? Can GitHub preserve the human social contract of software while becoming the operating layer for agents?Which brings us to the perfect person to answer these questions: GitHub COO Kyle Daigle. In this episode, he joins swyx to unpack what happens when AI doesn't just autocomplete code, but starts changing how companies operate, how open source works, how pull requests get reviewed, and how GitHub itself has to scale. We go deep on GitHub's internal AI workflows: micro-skills, WorkIQ, MCP, Slack, Teams, email, Copilot workflows, the new Copilot desktop app, CLI, cloud agents, and how Kyle uses agents to look backwards across company context before deciding what to do next. Kyle also reflects on GitHub's history building webhooks, APIs, Actions, npm, Dependabot, and Semmle, why the AI era is breaking GitHub in new ways, how Actions became a general-purpose compute layer, and what Copilot becomes after code completion.Full Video PodWe discuss:* Kyle's expanded role across GitHub* How AI got Kyle coding again after years in leadership* Why GitHub rolls out AI through existing workflows instead of forcing new tools* WorkIQ, MCP, Slack, Teams, email, and GitHub as company context* Why massive “mega-skills” are giving way to small, atomic micro-skills* How AI changes summarization, communications, marketing, and analyst work* Why former developers in leadership may have a unique advantage in the AI era* Kyle's “15 agents on Saturday” workflow* How Kyle built an AI-generated executive presentation for CRO/CFO teams* Why AI changes the chief of staff role without removing the human work* GitHub Actions, webhooks, arbitrary code execution, and secure agent compute* The npm acquisition, supply-chain security, 2FA, and token invalidation* Slop forks, vendoring, and whether AI agents change dependency management* What pull requests become when most PRs come from agents* Prompt requests, vouching, AI review, and trust in open source* What counts as a “developer” when AI lowers the barrier to building* GitHub Spark, low-code, and why GitHub refuses to hide the code* 14x commit growth, Actions load, databases, monorepos, and availability* Copilot's evolution from completion to CLI, desktop app, cloud agents, and SDK* Context, memory, rules, and making GitHub “act like Kyle wants it to act”* Ambient AI, OpenClaw, enterprise security, and the new operating system for agents* What swyx should ask Satya Nadella about Microsoft's AI futureKyle Daigle* LinkedIn: https://www.linkedin.com/in/kyledaigle* X: https://x.com/kdaigleTimestamps00:00:00 Introduction00:03:36 Why AI Got Kyle Coding Again00:07:04 Running GitHub with AI: WorkIQ, MCP, Slack, Teams, and Skills00:15:39 The Golden Age for Former Developers in Leadership00:17:31 15 Agents on Saturday and AI-Generated Executive Work00:20:20 How AI Changes the Chief of Staff Role00:21:45 GitHub's History: Actions, npm, Webhooks, and Open Source00:28:45 Slop Forks, Vendoring, and AI Dependency Management00:33:57 Pull Requests, Prompt Requests, and Trust in Agent-Generated Code00:41:21 GitHub Stars, 200M+ Developers, and the New AI Builder Wave00:45:15 GitHub Spark, Low-Code, and Why GitHub Still Shows the Code00:47:38 GitHub's Hardest Era: 14x Growth, Reliability, and Scale00:59:21 Actions as the Compute Layer for CI/CD and Automation01:02:04 The State and Future of GitHub Copilot01:08:24 Ambient AI, Background Agents, and the Future of the SDLC01:13:09 OpenClaw, Enterprise Security, and the New OS for Agents01:18:03 Build Announcements, WorkIQ, FoundryIQ, and Microsoft Context01:21:41 What Should swyx Ask Satya?TranscriptIntroduction: Kyle Daigle's Expanded Role at GitHub and MicrosoftSwyx [00:00:00]: We're here with Kyle Daigle, COO of GitHub. Welcome.Kyle [00:00:07]: Hey, thanks for having me.Swyx [00:00:08]: You're not just CEO of GitHub. People know you as that. You have a new role.Kyle [00:00:11]: So I have an expanded role now. I've been working at GitHub for thirteen years and doing all things developer. Joined as a developer myself. And now, I'm also responsible as the CMO of Developer for Microsoft. And so all the kind of learnings and passion for developers and how we work with them and how we communicate and how we bring our products to market, we're also bringing that expertise to the broader Microsoft ecosystem and helping every developer that uses a Microsoft product or would like to have a sort of similar experience that they've had with GitHub over the years. So it's a different role in some ways, but it's also just building on the experience that I've had at GitHub of just sort of tell the truth, be authentic, show people how to use it and then let the products speak for themselves. Now just doing that with, all of Microsoft.Swyx [00:01:09]: We'll be releasing this in conjunction with Build. You got lots of stuff planned, and we can sort of touch on that whenever it's appropriate. I think one of the interesting things is I rarely meet a COO who's also a CMO. I think you're a very outward facing and you're very confident publicly. That's rare. Do you actually view yourself as COO? What's What is your thing?From GitHub Developer to COO/CMO: Building the Platform and Operating GitHubKyle [00:01:33]: I think for me, it's been funny. The titles have always been, a— have always felt a little strange to me. I joined GitHub as a developer? I wrote so much of theSwyx [00:01:46]: Let's bring that up. You wrote the back ends?Kyle [00:01:48]: I was going through, I was going through, some old photos, when folks were talking about how things were being built or how there was a build GitHub. I built, webhooks and worked with teams building the API, built the platform layer. Anything that integrated with GitHub, up until really twenty eighteen, I built or ran the engineering teams. And that's kind of where my the beginning of my passion always was helping people build things, deliver them to, their customers. And so being a developer, building for developers was always super unique. In a— I think as my role expanded, it became my ability to talk to not just developers, but also enterprise customers or business leaders and have this translation layer. And then through all those years, GitHub has always operated pretty uniquely. Post-pandemic, working remotely was not as novel as it was when GitHub started in two thousand and eight. But all that expertise of running remote teams, doing it well, became this sort of bigger role, ultimately turning into the COO role of how do we operate GitHub in the way that GitHub's always operated after the Microsoft acquisition. And kind of so on from there. So like for me, I think the— I've, I still code. I love coding but the problem has always been, people. It's a much harder problem to both support our own employees, a harder problem to communicate to developers and enterprise buyers what we're building why it matters, ‘cause those are two very different messages. And so getting to work in the mix of COO, CMO, also just being a dev, I think is what's kept me at GitHub for so long.AI Workflows for Leadership: Commits, Retrospectives, and ContextSwyx [00:03:40]: Apparently, you have— your commits have gone up. What's this? What's going on?Kyle [00:03:45]: Rui's called me out pretty aggressively. So I think— as you can imagine, right, you can see my normal era of being a dev In the twenty thirteen, twenty fourteen era, and then moving into management, and then ultimately the COO role. I think what you see there is me, really getting back to coding thanks to AI. I— similar to, attaching problems between how to market and how to operate a business and how to code, I find, building agents and workflows that are connecting very disparate problems to be what's driving this. So that's, some of it's writing software. A lot of it is, connecting a ton of a different data sources to, help me out. But that is completely me really diving in on the AI side in trying out our tools, trying out everyone's tools, But building for me, building for the non-technical leader, though I'm technical and how we're, able to use these tools more than just the simple, call and response that I think a lot of the non-technical, your employers, you have to get— you have to use AI, and so everyone uses, ChatGPT or Copilot or Claude or whatever. To really get into, how is this going to help me out, it— I find that it's not the I need to write a blog post, I need to those simple examples. Helping people find the workflows of, “Okay, I need you to go through all the PRs today. I need you to go through everything that we've posted online. I need you to go through what we did the last three months. Go through all of my Obsidian notes for any mentions of this then go through my transcripts at work.” We use, Teams, so, using WorkIQ, go call that MCP server, grab all the transcripts, go through all the Slack, and then build me out the plan of, what this week's messaging actually was. That's something that was, impossible because for me, I find AI in a what most of this launch here is actually, less building forward. It's actually, a recursive loop backwards. I'm always looking at what had happened first. Go back through the week and tell me what we did, what worked, what didn't work? And then tell me in the next three or four days-What would you tweak based on this sort of like looking backwards and then looking ahead a little bit? I find that to be so much more valuable, especially for like non-technical, because that retrospection is actually LLMs are very good at that. Like finding all the patterns, pulling them out, and then applying that retrospection to just a couple of days or just like a short period of time. Is all a bunch of apps that I've built and launched a bunch of, internal tools. I use the new, GitHub Copilot app, the desktop app with workflows. Every time I crack open my laptop, it's running workflows for me. It's just a ton of different stuff and of course, it all ends up on, it all ends up on GitHub.Swyx [00:06:47]: Of course. That's where, that's where, stuff is hosted. Man, there's so much to ask you. I was going to leave the how do you run a company with AI thing at the end. I have to ask one— double click one thing. You said, you are looking back at the week. You're, you're understanding what happens. When you say we That's three thousand people. How?Rolling Out AI Internally: Skills, CLIs, and Company ContextKyle [00:07:09]: I think when we started rolling out AI internally beyond engineering, right? One of the things that I was really, passionate about is like we have to do this in a way where no one has to change how they work. I don't want to have to teach you a tool. I don't want to have to teach you something new. And so for us, we tried out a few tools. Most of them don't work because I got to get you on board? I got to teach you how to use it. What we've actually ended up doing is we've built like a set of skills internally. We have we each have our set of skills, and we've just been distributing even to the non-technical folks, the CLI. And then effectively, we're just giving it access to like read about everything that we're writing. So that's for us, that's usually GitHub, Teams, Email, and Slack. So Teams for, video chat, generally speaking.Swyx [00:08:03]: Teams and Slack?Kyle [00:08:04]: so we use Teams for video communication, but we don't use it for chat. W-we— GitHub for a long history, right? We're alwaysSwyx [00:08:13]: Also SlackKyle [00:08:14]: Talking about ChatOps and like everything is built into Slack. Like every command, every flow.Swyx [00:08:18]: So even though you have been acquired for I don't know, eight years nowKyle [00:08:22]: we stillSwyx [00:08:23]: You still use Slack?Kyle [00:08:23]: it's a purpose-built tool for us, and I think the reality is that moving off of it would be so bluntly expensive? Simply because all the tooling is, baked in with that paradigm. And they both have their pros and cons but they don't work the same way at all. We still use a bunch of different tools Because it's the purpose-built tools that We need. And thenSwyx [00:08:47]: Well, the same doesn't go for the rest of Microsoft, presumably.Kyle [00:08:50]: like the like various teams like operateSwyx [00:08:53]: They make their own decisionsKyle [00:08:54]: Various ways. I think it just matters what you're trying to what you're trying to do. But we do we do work across kind of every tool that we use, and then by giving everyone access to all of that context and the new WorkIQ MCP server, which is quite cool if you do live in the M365 like world. I can ask it all these backwards-facing questions, and it's incredibly important for our teams that are working remotely. There's a lot of stuff you miss when you're not in an office, and we are spread out all over the world. So most of that is looking back. And then we post, we post either auto-automatically into GitHub issues or discussions, these sorts of like findings or like our industry reports. Like what's happening this morning, today, yesterday. A little automation gets run. We'll use the app. We might use GitHub Actions like with, our agentic workflows just to go do that run, and then we push it into GitHub, and w-we keep having a conversation. So usually for us, it's about that sort of like looking back, looking forward on the non-technical side. And then of course for a lot of those folks, it's also building an app, pushing it to GitHub pages or pushing it somewhere to host it et cetera. But it's just like enabling everyone with that power of it's going to take me a week to figure this out. Instead, we're going “Okay I built a skill. Let's put it into a repo. We'll all share that skill together, and then we'll use the CLI or now the app-” “just to run it.”Micro Skills vs. Mega Skills: How GitHub Uses AI at WorkSwyx [00:10:26]: All right. I think, I think we're going straight into like the team management and productivity thing. I think a lot of people are getting various levels of LLM psychosis. How do you manage the bloat of skills? Like everyone Has their thing, and they're Like trying to promote it to the rest of their peers in their org, right? And obviously, whoever becomes a skill influencer internally becomes like an AI leader, right? Of sorts. I assume you have those.Kyle [00:10:50]: like I think we haveSwyx [00:10:52]: And I assume it's a mess a Yeah.Kyle [00:10:54]: there's like I— like I think the reality is there's two pieces. Like first is I think that we're ending the era of these like massive, beautiful, perfect skills that are just like not any of those things. ‘cause for a while, right every tweet every day is like go download the skills, the perfectly managed thing to do this entire workflow. And I think that like what we've found and what— I was just with my team, this week, and we were talking about the skill side, and we're really talking about these like incredibly micro skills that are just doing one thing for us very well Versus a skill that's going to do I said, that full report. That doesn't really exist on our side anymore. It's usually how do— like a single skill that's going to identify the most important marketing information given any MCP server. Like this is the most important thing. Less about stitch a bunch of tools together and have it produce this mega output because then weeks go by, months go by, things change, and you want to tweakSwyx [00:11:58]: It's brittleKyle [00:11:58]: Your mega skill and you're screwed? You can't do that. And so now we're really just talking about the Legos we're using and just letting the instruction book be something we're all putting together. Whereas I think a lot of AI skills for a while have been that mega instruction book style.Swyx [00:12:15]: I've, thought a lot about Postel's law. I don't know if that's a term that is, means things to folks. It's the idea that you should be liberal in what you accept and strict in what you output, right? And I think that's like a good framing principle for skills. This is my skills, obviously on GitHub. I feel like everyone should have like how like some repos In GitHub are special repos? I feel like we should sort of reify the slash skills and everyone like give it some kind of special presentation. Anyway, so, yeah, this is one of those like download Download anything, transcribe anything, and then you can string together the atomic skills that do one thing well Into like some kind of orchestration skill that calls other skills. I assume, does that match?Kyle [00:12:56]: I like I think so. I think that theSwyx [00:13:00]: Summarize anything.Kyle [00:13:01]: Like I think the- For me, summarizing something for I do communications and PR and analyst relations and marketing and customer activities, and so my summarize everything is very different for each one of those like Contexts. What ‘Cause if I'm summarizing something for an analyst, that's a very different thing than, probably how I'm going to summarize something for like a customer meeting or an engagement. So that's I think like the difference when we're talking about the like the tools I might use on Saturday or the skills I might use on a Saturday when it's just for Kyle. Yeah, those are kind of like they have an atomic actual tool underneath or maybe skill, and then Kyle cares about X. But I think when we're talking about work and enabling the the marketers, communicators there, it's the atomic, this is what good summarization is, and then this is what I care about as for marketing for communications For whatever. And that I think is like the interesting matrix problem when we go from like a developer set of concerns to all kinds of different professions, is that what that word means to me is different than it means to you is different than it means to the analyst or the salesperson, and that's where I think the matrix mess is that we're starting to like still starting to find. It's about these mega skills but they're all just slight permutations, but those permutations are really important. It's the difference between someone reading this and going “Did AI make this?” what Or “This makes total sense, and I would expect this when I'm giving a briefing to Gartner,” or like whatever else.Swyx [00:14:37]: I think the beauty of it maybe is that you don't have to be that careful about what goes in there. It doesn't have to exactly fit as long as it like roughly is contained in there. I used to complain about plugin hell, basically. Like when you have a framework and then you have a hundred things that you need to integrate, everyone does like the GitHub used to be bloated full of these things. And now we don't need them anymore ‘cause now you just use skills.Former Developers in Leadership: AI as a Creation MultiplierKyle [00:15:00]: And like I think the most magical thing is the just that like I can just also crack it open. Like Like yes, I could go like change the how the plugin is coded, or like I could go do that now with AI, but I think there's just something more magical about getting a response back and being “That's not right,” and then you just crack the skill open, you just type English words and it's different. That building block is just, I think very unique. Once I get everyone to kind of understand how to best how to best make those changes to get the most power out of them.Swyx [00:15:36]: Is there a— you have a your peer group that Of people like you. Is there a common framing for Something I'm feeling is, which is true, is that is this a golden age for former developers who are now in leadership? Because you can wield the tools, you would know the right words, you're maybe not too close to the details. Doesn't matter. But like you're more effective than someone who doesn't come from that background.Kyle [00:15:59]: I think that like the secret has always been your ability to identify patterns and solve problems, and I think that for folks that like myself that don't code day to day anymore, that has made me successful as a developer, made me successful as a COO and now CMO. And so now that I have access to get and write code, I'm now applying that sort of like pattern finding and problem solving, and I know enough still about how to then go and say, “Oh, I want to make an app, but I don't want to break into jail or create something that's not going to be able to work or to be deployed scale or whatever.” that ability to apply all that additional business knowledge and still code I think is what makes that so interesting to me. Slightly different than I think some of the other like technical leaders that became business leaders and now are going back to their apps and updating them. Good for them? But I think the more, much more interesting thing is, well, now I have this whole new set of expertise over ten plus years. Why not take that and use that as a developer with these AI tools? So I definitely think that makes me more powerful, but I think that's true for like every dev as well. Most of the dev friends I still have also have some other underlying skill and passion. There's really talented, very kind of linear computer science software devs, absolutely. I just find that the folks that came from a different career, went to school for something else, went off and did this random thing, and then became a software dev, or were a dev, did a random thing, came back. Learning that extra set of information, learning those extra skills, and now having the power of an AI where I can crank up fifteen agents on Saturday while my kids are doing lacrosse, That's like really powerful. And I think it gets me back to that feeling of like creation, and it's very hard to replicate that in most other senses? That first time you build an app and you click it and you show someone that's magical. And so being able to do that not just in code, but across all kinds of different assets that's, that's huge. We were doing we're doing our every year we do our revenue planning. We talk about okay, what is it going to look like for next year? And of course as you imagine, there's, slideshows everywhere talking about what are we going to talk about, what's the narrative, et cetera. And so as you said I'm “Okay, well, I could probably just like build something to build this and then that way I don't have to go build the whole spreadsheet or I have to pass it to my team.” So we went through this process, and I got all the information and used the skills I mentioned. I built like a little app just to make it so I could look at some of the information in a SQLite database, more easily. And I ultimately built this entire presentation without touching any of it and I was “Okay, I'm just going to present this to our CRO, the CFO, their teams,” without mentioning I'd built it with AI. I like built a skill to make it look very much not AI driven. Just not pretty.AI-Generated Presentations, Human Taste, and the Changing Chief of Staff RoleSwyx [00:19:03]: Like a design. Yeah.Kyle [00:19:03]: Not pretty. But just like very clearly not AI. Kind of like don't do anything interesting.Swyx [00:19:08]: That's, yeah, that is valuable.Kyle [00:19:08]: Just go Exactly. We did the whole thing through. It used my notes from Obsidian, it used all the context I mentioned before, the plans, and Never came up once that it was AI generated.Swyx [00:19:20]: It didn't matter.Kyle [00:19:20]: Never once. D It didn't matter. And so now I takeSwyx [00:19:23]: This is a toolKyle [00:19:23]: I can take that tool and go, “Look, I don't want you to go build slideshows.” They're just helping us share information with each other. If this thing can do it With a little bit of crafting from you and then we can look at it together, awesome. There's no value in all that extra work. I think that the ability to, make it look humanly bad and and build a little app to, manipulate the data I think is part of, that upside for devs that are now in leadership roles. Because, the thing that I feel like I said before, this that's all a people, that's all a people problem. I know if you've used a coworker or not to build a slide deck, unless you spent a bunch of time to not do it.Swyx [00:20:07]: I know, but like it was so, I think there's a certain charm to just being blatantly AI. ‘Cause I think that you're well, you're just honest about There may be mistakes here that I cannot vouch for. So how much value is there? But anyway I think, actually the real question I want to ask is, there's a— You were a chief of staff To Thomas. And in the pre-AI world, the that job would've been a chief of staff job of like Can you prep me these slides and all that? And now you do it yourself.Kyle [00:20:35]: I still, I still have a chief of staff. Because, the difference is it's sort of the discussion every time we have some sort of technology evolution is it's not that the jobs the roles don't all go away, they just change? And so yeah, I don't have someone spending all their time building out slides for me and presentations ‘cause I don't need that anymore. But now I need that person that is able to go and find all the different connections between humans in those discussions to help me find out, okay, I should be meeting with this group and this team, and they have an opportunity, and I'm going to be in San Francisco today, I'm going to be in Seattle tomorrow. Those sorts of human connection aspects are still incredibly valuable and has always been a big part of that chief of staff role. But now just like chiefs of staff are not opening up, letters to process, they're doing emails. What It's the same thing. And now they're, they're not building out as many of these presentations because they have the the ability to have a AI take it on for, and share that with me and great. Let's keep moving ‘cause it's allowing us to go faster and make better decisions more quickly.Swyx [00:21:45]: Awesome. Well, so we can dive into more sort of, Productivity insights as you go. I did want to do a little bit of a brief history of colleague and hub. Because, we started here. And then you also involved the NPM acquisition. I did, I do want to touch upon that. And then more recently, I just want to bring up to present day where we're having uptime issues Which transparently we've already Addressed publicly, but we'll, we'll discuss in the pod. Did I miss anything? Like what, any other major highlights? Obviously, it's, it's a lot of years to cover.A Brief History of GitHub: Webhooks, Actions, Acquisitions, and Platform EvolutionKyle [00:22:15]: No the I think one of one highlight was right before the acquisition closed in twenty eighteen, I got to launch the first version of ActionsSwyx [00:22:27]: OhKyle [00:22:27]: At GitHub Universe. So it was OSwyx [00:22:29]: They're that young?Kyle [00:22:30]: It was October of twenty eighteen, I think. Yeah. Yeah.Swyx [00:22:33]: Gee, Jesus.Kyle [00:22:34]: I got to I was the engineering leader on that project and got to launch that. And then, yeah, we did acquisitions of NPM you said, Semmle, Dependabot Pul Panda a whole bunch of things. That was a bigSwyx [00:22:47]: Pul Panda.Kyle [00:22:48]: Abi is doing well.Swyx [00:22:51]: DX. Holy crap.Kyle [00:22:52]: Did well on DX. I and like that was a that was the big shift, after the acquisition. I had to join the sort of business side.Swyx [00:23:00]: So I need to hit you on some of these things ‘cause you were there. Right? And how often do I get to talk to someone who was there? But yeah, Actions. Is that the number one source of security issues on GitHub?Kyle [00:23:11]: Oh, sh I think that the number one source of, security issues is probably like all, the literal code in everyone's like underlying repositories. I would say back further than that is, if you remember I had to show in this graph was this is, I'm, didn't say this before, this is ultimately webhooks.Swyx [00:23:30]: You yeah.Kyle [00:23:31]: Like circa whatever it was.Swyx [00:23:32]: It says Hookshot in there.Kyle [00:23:32]: I forget. Yeah. Yeah, Hookshot's in there. And so like back then, it says GitHub Services. Do you see, it says Hookshot FE for front end, and then it says GitHub Services. GitHub Services back in the old days, right? You we had a repository that was Ruby code, and you could write any Ruby code in there, and then we would execute that On your behalf As a service, and then that way if an if you were trying to integrate with something, it didn't we would run it for you.Swyx [00:23:57]: And of course no containers ‘causeKyle [00:23:58]: No, ‘cause it wasSwyx [00:23:59]: Well, no containersKyle [00:24:00]: Twenty fourteen. And so there was some isolation obviously, but it was mostly the separations on the server level. That's like an example as long as the very old version of Pages, which ran on its own containerization infrastructure, not on Actions.Swyx [00:24:15]: Which like all-time great product.Kyle [00:24:16]: Pages powers the internet at this point to some degree. Those were places where like clearly there were no like issues like to my knowledge. But it was those things where I'm looking at and going “Okay, well we can't be running arbitrary Ruby code,” like on everyone's behalf. Then containerizing all of that up intoUh into actions now where yeah the containerization, is r-really good. The pinning most folks aren't pinning it the like to a particularSwyx [00:24:48]: ImagesKyle [00:24:48]: Sha, et cetera like their workflows, and so that's a big that's a big place Of pain for folks if they're just doing similar to any dependency management, just V1 or newest or latest, I think. But, that journey from that day to “Okay, we're just going to run all this arbitrary code, and, it'll basically be okay,” to now, no, we have, really good containerization. We have a new, underlying, ag-agent, containerization, service. It's like we're using it under the hood. It's through Azure. They recently announced it. The Azure, Dev Compute, but it's, very fast, very fast compute to be able to, spin up your own cloud agents, or whatnot. We're using it under the hood for some parts of the new,Swyx [00:25:36]: Microsoft Dev Box?Kyle [00:25:37]: No. Dev Compute, yeah.Swyx [00:25:41]: Hmm. Not finding it just yet.Kyle [00:25:44]: Oh, it's, it's in there somewhere.Swyx [00:25:46]: All right. Well, we'll cut that out.Kyle [00:25:47]: Sorry. But with, Dev Compute, you can, run, really fast, spin up really, small VMs really quickly, so you're doing a tool callSwyx [00:25:58]: Same conceptKyle [00:25:58]: Just do it containerize exact-exactly. So we're using that so definitely moving that direction to protect us from every every piece of code that we're ultimately running.Swyx [00:26:07]: look, that grows into the full SDLC? Code hosting was just the start and and then it's grown beyond that. Let's talk about NPM may-maybe ‘cause I think that's also, a very major point in the industry. I do think, it was looking for a home. It was, kind of struggling as a business, right? I don't know, I don't know how you would characterize that whole acquisition and how itNPM, Package Security, and Keeping the Internet RunningKyle [00:26:33]: like when we were talking to the team, I think the big thing for the both of us was to find a way to keep NPM, which was basically powering the internet then and way more so now to some degree running. Keep it going keep continuing to scale. It was having scaling problems, if I recall, back at that time. They were doing some rewrites. ItSwyx [00:27:00]: that's cute compared to now.Kyle [00:27:01]: Well, that's the thing is like when I'm talking to folks now, there's there's so many more underlying uses of NPM than there were back when we had them join in with GitHub. But that was ultimately the goal. It was really okay, we used to have pages. We have, the world's code. Let's make sure that we can keep NPM running well for the world. And we put a bunch of time and investment into fixing some of the underlying backend, changes, some of which we talked about some of the manifest work, et cetera. And then now, really trying to bring the the security posture of NPM up to speed. But, it is a unique challenge in that every move that we make to make it more secure will break a lot of people. And security is paramount. And also, we take it very seriously. We're, the any time that we have a problem with GitHub or we make a change that makes us more secure but hurts, there's, a snow day for developers or a really bad fire that they have to go put out. And so we've, have changed the 2FA policies. We've changed the way the tokens work. When we find tokens that have been exposed or potentially, exposed, we invalidate them, andSwyx [00:28:22]: I love that feature in GitHub. Yeah, it's greatKyle [00:28:23]: That creates issues, but, the but that's the thing is we're trying to push the community, forward without necessarily, doing something that is going to break the contract that's been for 15 years or close to it or some amount of years on NPM.Slop Forks, Vendoring, and the Future of Open Source Supply ChainsSwyx [00:28:43]: I think the— So now we're talking about, open source and publishing. And I think there's something here with what people are calling slop forks, which, I think Malta from Vercel is doing. And, part of me thinks, well, the way to get past any vulnerabilities, we just, let's just get rid of the concept of NPM. And we only publish source code. And anytime you want to import it you have your coding agent look at it and then adapt whatever subset you're going to use into your vendor it. But, the AI vendor it. Is that realistic? I don't know. Is it— Will that solve all our security issues? I don't know.Kyle [00:29:24]: I don't think it'll solve I so Mitchell was just talking Mitchell Hashimoto Was just talking about this today, and I think that I-in some ways, it's all all things, old or new again? Yeah, absolutely vendoring everything. Like I do I do remember twenty thirteen, twenty fourteen.Swyx [00:29:42]: This is Yeah. Let's, we must return toKyle [00:29:43]: That's what is We were vendoring everything. We were having actual discussions around, or at least I remember we were “Should we take this full thing?” “Why is this so big? We only need this one file.” And so I do think there's something true there where having either taking only what you need or the dependencies just getting incredibly small over time, I think will help to some degree, but it's not going to solve the fundamental problem, I don't think, because the vulnerabilities in an agent looking at them, there's time and time again, there's a million different ways in which we can convince an agent that this thing is, secure or not and pull it in. Or we can do static code analysis or runtime testing to say whether the code works or not. That is, I think, the step that needs to continue to be, invested in. The question is just on, how much scope. Should it be this enormous project that I'm pulling down, or should it be this piece? Either most companies are running some amount of security checking on the on the packages that they're bringing in or vendoring. That I think won't change. That's like what advanced security does to some degree, Socket does some degree. Like everyone is doing a piece of that. How we each do that like especially when we're talking to enterprise customers, is just like very different. No there's no one wants one single way to do it. And I think that's always been GitHub's, unique position in the world. I talk a lot to maintainers, I talk a lot to folks about this. It's we're— we rarely start like a process and a practice and like push it onto the community. We usually wait for the sort of like RFC process socially or literally, everyone agreeing, and then we'll cement something in. Because otherwise we'reMaintainers, RFCs, Vouching, and the Social Layer of TrustSwyx [00:31:35]: That fits your role in the ecosystem, yeahKyle [00:31:36]: We're GitHub. Yeah, we don't want to shape the whole thing. We want it to be figured out. But like how do you balance that like sort of Role in the industry to keep everything as secure as is possible and make sure that you're you're not going to be compromised as a human, ‘cause that's usually how it all happens. And Not not create a process or lock us into a flow that you're not going to or like Mitchell's not going to or other open source projects aren't going to like. That's always been a tricky balance for us, and I think that's something that we haven't talked about enough is we're not going to be able to fix everything for everyone in a way that everyone is going to like. So tell, help us, tell us what is working. When Mitchell was talking about, the Upvote, the upSwyx [00:32:22]: I was going to bring up his thing. Yeah.Kyle [00:32:23]: I forget what it Yeah. When he's talking to us, I was chatting with him and talking to him about this and I put it on Twitter and we talked to, also over DM, was “We're going to keep working.” but I think the important thing is I do actually want to hear what isn't working for you. And as, be as specific and clear for your project as is possible. And to every piece of credit over the many years that we've known each other through the industry, he's always done that and I appreciate that ‘cause there are places that we need to fix up, and we hear from him, and we'll fix up just like we do all other kinds of maintainers. But that that process between making those types of improvements and being more secure and like creating, I forget what he calls it's not the proof process, not the claims process. Do what I'm talking about? He has that he his projects have a way for you to kind of like,Swyx [00:33:13]: VouchKyle [00:33:13]: Vouch. Thank you. Yeah. He has like the vouch system for saying, “Hey, you should accept my PRs.” That's beenSwyx [00:33:20]: I just built this into GitHub. I don't know.Kyle [00:33:22]: Well, see, but that's the thing is that you say that and like he and his community really likes this and then I'll go talk to other maintainers and other maintainers, globally, and they're “No, this doesn't work for me.” And that is the tension, but also the kind of beauty of GitHub, depending on which way you look at it is we want to help maintainers, so we create all these tools to let you have more control over how much you take in from AI and PRs. But you can also use this. What You can go use this project, and if it takes off and becomes the kind of mostly standard, then yeah, we probably wouldn't enforce it but we would add it in because that's the flow that we tend to do?Swyx [00:34:02]: I hear a lot of people don't know the history of the pull request. And like like that's how, that's something that GitHub standardized basically.Kyle [00:34:08]: Yeah. It was a very messy process Like beforehand, and now the we have the benefit of it being the process? And now we have to go and Figure out the next best process or what adaptations change, or what does a pull request look like when eighty percent of your PRs are just coming from your agents and not From other devs?Swyx [00:34:31]: Do you like the prompt request idea from Peter?Kyle [00:34:34]: like I think that for each like each idea I think has its merits. I'm not, I'm not avoiding saying anything good or bad, but I feel like I've seen a version of we have that we have entire Thomas' store. Take all the assets of what you've built and put that in. I think that's got great ideas. There's all these various permutations of the PR flow, but I think the reason why there's not a single answer is ultimately we're trying to codify trust. We're trying to say “Okay, if Sean reviews this I'm going to trust it because you're Sean or you're the senior dev or you're the whatever.” And right now, when we are working in a flow where an agent writes code and another agent reviews code and then Kyle goes and looks at it the trust is kind of diffuse. And most of the tools that we're talking about are talking more about verification flows. We have more assets to look at, so I can probably say whether this is a good PR or not. But that still doesn't solve, I think, the human problem of I'm looking at a PR and I want to know if I can trust it. And we're still, we still tend to use human signals for that? Mitchell approving it or Kyle approving it or whatever. And so I think that's, I think that's why most of these options haven't really solved it is because, it's a social problem ultimately. It's a it's a human problem to review it and agree. Or you fully trust the tool and you're imbuing that tool with full trust Which I think in some cases that absolutely exists.AI-Generated PRs, Trust, and the Waymo AnalogySwyx [00:36:08]: And so like in the same way that there will be a tipping point in society when we don't allow humans to drive anymore Because machines are measurably better than Than humans. I'm looking for that tipping point, right? Like Mythos is ridiculously expensive. Someday we'll have Mythos on a desktop. I don't know. Will, does that change the equation?Kyle [00:36:30]: I think it's more I took a Waymo here, and I was on my phone and not looking around at all. There are other, self-driving, vehicles that I would not trust while, staring at the road. And I think that trust is something that isSwyx [00:36:48]: Is this a Zoox thing? What is itKyle [00:36:50]: I think that is both. I think that is both. LikeSwyx [00:36:53]: There's Zoox in this robo taxi. That's it. It'sKyle [00:36:56]: Well, depending on what level Of self-driving. But, my point is sort of that I think part of that is I strongly believe that's, a mixture of verifiable proof. Like how many accidents, how much data, and so on, and the human aspect of how I feel when I'm in this car, what it tells me, et cetera. And so that's why I think some of the like Some of these some of our AI tools tend to, imbue me with more of that feeling of trust, even if the data says this is 100% accurate. I feel like it takes more time for us to go, “Should I trust this or not?” And that's in the soft sense of, startups with high agency, weekend projects, and open source. And then there's enterprises and regulated industries and everything else, and that is an even harder problem to go solve because even when it is fully verified, not only do you have to have trust from the humans on the team, you probably have to have trust from multinational,Swyx [00:37:55]: Oh my GodKyle [00:37:55]: Multi governments around the world and regulating agencies. And so that's where I feel like until we tip over to your point on the sort of like human EQ side of it. I feel okay this feels okay I've been proven enough. Then the ball will start to roll a lot faster, where we'll end up getting to the “Okay, we can trust this,” and feel good about it in the Most difficult of cases.Reputation, Sponsors, Stars, and Bot Activity on GitHubSwyx [00:38:18]: If human trust is the thing that matters, I feel like GitHub as the developer social network could maybe do more there. Like vouchers are one system But, we have star counts, and then we have Contributor rights, and that's it. And I feel like there should be more in that space. I don't know if there's any other design decisions there.Kyle [00:38:37]: I think that one of the places that we don't really expose right now in this sort of way is, some degree of like hard trust and support, which would like for me is like sponsors is a good example of that.Swyx [00:38:49]: Ah.Kyle [00:38:49]: It like costs you something. To prove that I believe in your project and I trust you To some degree or I want to support you at the very least.Swyx [00:38:56]: Solve payments for open source. Why not?Kyle [00:38:58]: I think that I think that like as we keep moving forward, right, there's more and more projects where I'm, adding more and more dollars into sponsors personally because I want to like support them, but I also like know of I've probably never met them in person, but, I know of enough of their work that I want to support them. I think the thing that I don't love about stars or commit counts or anything else is ultimately, even with all of the various, abuse and de-spamming and deduplication work that we do or anti-abuse work that we do, these are all, not active social signals. They're passive ones that are ultimately gamifiable. And you may trust me, but another open source maintainer may not. And on what heuristic should you be, trusting me? That I think, is kind of where some of our thinking is right now. What signal from me is most important to you? You— If you can define that potentially, honestly in an agentic workflow that's what we see some of these open source projects do, where you have GitHub actions, and then you have like an agentic workflow that's calling AI, and you're setting these rules. Like if Kyle has submitted and gotten accepted PRs across any given project and has a social handle tied to his account in GitHub, and that social account's older than a certain amount. Really complex measures that matter to you ‘cause most open source projects have that heuristic built into their heads, if not written down in the contributing guidelines. You could take that and then go apply that and then just say, “Oh, we're not going to accept this PR.” Building something that is, I think, malleable to everyone's needs, is a little bit better, rather than going “Hmm, this account's too young.” Because what happens? The attackers just go and go and create a multitude of accounts, and they wait Until it ages up. Needs to have a certain amount of stars. That's how star inflation happens. Need to have a certain amount of reposSwyx [00:40:46]: Oh my God. YeahKyle [00:40:47]: With PRs. They all just create repos and submit PRs to each other, and then they come in and do something nefarious. And so, it's hard. It's hard to find the measure. So I think we're, we're looking more at how can we provide you tools so you can kind of choose what's best for you. And of course, we'll give you some standards. But the trust vector, gets down to I don't know, some version of like human digital ID like everyone's been talking about. Like how do I prove that it's meSwyx [00:41:13]: Give me your eyeballsKyle [00:41:14]: On the internet. Give me your eyeballs. Exactly.Swyx [00:41:18]: The I got to keep moving on Topics, but obviously I can go all day on this stuff because, I've been involved in GitHub and open source My entire professional career. Stars. Very superficial. Everyone knows it. But I think time to one hundred thousand stars is the fastest I've ever seen. Like people just reached that in I don't know, months. And then like at the same time I don't trust it right? Like how many of these are real or bot or like whatever. I don't know how to ask this but like what can we do about it? LikeKyle [00:41:49]: JustSwyx [00:41:49]: Is stars broken? Is stars fine?Kyle [00:41:51]: I think that there's kind of two, there's like two pieces. Obviously we're constantly like trying to find ways in which like your users are producing spam, which would, I would include like be like only doing star gamification. When we find them, we pluck ‘em out and we,Swyx [00:42:08]: But it's like a Whac-A-MoleKyle [00:42:10]: It's a hundred percent like a Whac-A-MoleSwyx [00:42:11]: There's no wayKyle [00:42:11]: Now, powered by AI to be helpful. But I think more so what I'm seeing is, a lot of the like fastest time to X tends to be because we're now inviting so many more people into like software development on GitHub That like the zeitgeist is just swarming? And it'sSwyx [00:42:32]: It's not just developers anymoreKyle [00:42:33]: And it's not you and I. Like like however you want to say like what a developer is it's not just folks who have been coding for a very long time. It's folks that have maybe started coding or only joined in since the AI era. And nowSwyx [00:42:44]: what's the latest Octoverse number? I know eighty million was my lastRem- member that a number of developers on GitHubKyle [00:42:50]: Oh, we're over 200 million now.Swyx [00:42:53]: Okay. Well, so you see?Kyle [00:42:55]: Like over 200 million developers now.Swyx [00:42:56]: But it's not developers, right? It's, it's people with a GitHub account.What Counts as a Developer in the AI Era?Kyle [00:43:00]: So, so this is, this is the biggest debate that I would say, everyone loves to have at GitHub at this point. From my perspective, right, I think that there's, there's clearly a difference between, professional enterprise developer and then developers. But I think that I think that the idea that we should be I don't know, splitting hairs or segmenting developers in the early era of software development is, not worth our not worth the time. SoSwyx [00:43:29]: When you get into gatekeepingKyle [00:43:31]: 100%Swyx [00:43:31]: What is a developer?Kyle [00:43:31]: 100%. ‘Cause I wasn't a developer when I started writing code? I was going toSwyx [00:43:36]: Oh, no. I made— I cloned a thing, seven years before I learned to code. And then I and then I wrote about my learning to code journey, and people Just called me a fraud ‘cause I had a GitHub account. And I'm “Well, no, I just use GitHub, but I don't know-” “I didn't know what I was doing.”Kyle [00:43:49]: I I remember that. I remember those sets of posts, and like that's, that's b******t. So I fight very clearly on the line of, if you create code, if you have an idea and you create it into some way of, I'm, I'm going to run it and use the app right now, you may still use AI in that moment, but that's okay. At some point you're going to do the next thing. You're going to create a big— You're going to have to learn about this database. You're going to fix a bug, whatever. We're all on some same journey, and those people are also hearing about the great new agent skill package or a new CLI tool or a new whatever. And those projects are going up because you want to be a part of this moment, just like I wanted to be a part of the Ruby community when Ruby was popping off when I started becoming a developer, and now I can just click the star button. And so I think that yes, there's clearly some amount of like spamming and game gamification that we're working against, but I really think we're just seeing this whole new cohort of folks that are moving from technology to technology because they're not working on a 20-year-old software application. They're working on a side app that they built on the weekend for their friends or for their new idea or whatever. And that's how you see these enormous charts going up and to the right with With stars.Swyx [00:44:59]: I think something that's remarkable is the persistence or, that GitHub extends to those folks. Usually when I see platforms go into a new audience, they usually have to, have like a second platform with a different name that wraps the main platform. But somehow GitHub has been able to sort of persist and extend, and it's friendly and whatever? So it's, it's nice.Spark, Low-Code, and Always Showing the CodeKyle [00:45:19]: I that's partially why I think as we've tried to move into I don't know, more like low-code-y things. We so we started working on Spark as like a way to, build an app and run it. I think that the reality is that we anytime we try to, kind of put even a veneer on top of it without when we put a veneer on top of something, we still always show you the code. That's kind of like a tenant. We're never going to, hide the code from you ever, because whatSwyx [00:45:52]: Why would you?Kyle [00:45:52]: That's, yeah, that's the whole point? However, I think that what we learned with things like Spark is that really the value of Spark for most devs is, easy runtime. And you may have a runtime or a host that you're going to use for that or you just build something and run it but, the package of making that even more simple isn't really needed for folks that are trying to build software and not just trying to build, an app, which is, slightly different, a slightly different goal. So I want to get you in, I want to get you comfortable. I think the best thing for me as, someone that did not traditionally come into software dev way back, I want anyone to be able to breach that chasm and not be in the I don't know, I feel like we're, we're still in an era of, STEM. I've got a 12-year-old and an eight-year-old, and it's “We got to get ‘em into STEM,”? Over and over. And I like I do, I do the things that good parents do. I was “Oh, you want to do coding?” “Yes, I want to do coding.” Do coding classes. But now they're just not afraid of doing software. And that's, I think, the thing that's honestly kept me at GitHub for so long. Anyone should be able to go and build a thing, just like I can go change a light switch in my house. I'm not going to go into the breaker box ‘cause I'll probably kill myself? But, I can go change that light switch. Everyone should be able to go and say, “This fricking app doesn't do what I want. I want it to work like this.” And that I think, is what's kind of kept us all connected with GitHub through the years and some and during the easiest of times or in the hard times because of that opportunity of, we're the home for all developers, and we want everyone to be able to have that feeling that we've had of, had an idea, I created it and holy s**t here it is.Swyx [00:47:37]: Here it is. All right, I'm going to try to do more spicy questions.GitHub's Hardest Scaling Moment: Growth, Agents, and UptimeKyle [00:47:42]: Great.Swyx [00:47:42]: Is it an easy time now or a hard time?Kyle [00:47:45]: Oh at GitHub? It's a hard time. Like, it's a hard time and also, I was just with my team and I said, “This is also, the best and most exciting time that I think I can remember at GitHub.” BecauseSwyx [00:47:57]: Best of times, worst of times. It's never oneKyle [00:47:59]: ‘cause we've we were talking about Octoverse reports and, usually we do an Octoverse report once a year, and we look at the numbers, and we say, “Oh my goodness.” I was at Universe in October saying, “This was the fastest year of growth that we've ever had,” right? And now we're doing more in a month than we did in a year last year.Swyx [00:48:20]: You're talking about PRs.Kyle [00:48:21]: Commits.Swyx [00:48:21]: Commits, yeah.Kyle [00:48:22]: PRs. Kind of like you name it by roughly every measure that we're looking at, there's some amount of sort of growth that is much bigger, and that is breaking our system in new ways, not old ways. Like webhooks were always notoriously, unreliable over the years?Swyx [00:48:38]: Whose fault is that?Kyle [00:48:39]: not anymore mine, but for a period of time, I'm sure you could pull up a tweet that was “It was me. I'm sorry.” but, now, that got rewritten at a scale level that is still working and is not having problems today. Now what we're finding isn't just the isn't the-The simple stuff that folks are on the sometimes on Twitter or on the internet are “Hey, why is this like this?” Sure. There's absolutely silly problems that we shouldn't exist. But now we're talking about, unique, novel permission problems that happen only at a scale across all different objects or whatever, that now we have to go rewrite this underlying system. And so it's, there are problems that yeah, caught us off guard, which I think I said. Like the growth is astronomical, but also we're making such material progress in that I'm excited once we're once we've kind of like reimagined the underlying foundation layer, or pieces of it at least, what's going to be possible when it's not just all of us and all the new people that are being developers and all of their agents and all the tools like working together. Because that'll still happen in that in that GitHub tool, that GitHub community. But it's a it's a hard day anytime we can't give you what you're looking for. We have the same problem internally. We operate through github. Com. Of course, we have backups when things go down and whatnot for our own operations but we feel it too. If it's not working it's not working for us, and that's kind of like the promise of dogfooding for GitHub. It's always been true. We're using the same tool you're using. We're not using a super secret version. We and so we also need it to be great for us for our customers of course for open source. And now an exponential growth of agents, Doing it too.Swyx [00:50:32]: I wanted to load for audio listeners who maybe haven't seen your tweets, whatever. So one billion commits in twenty-five. Now it's two hundred and seventy-five million per week on pace for fourteen billion this year, if growth remains linear. Is that still the pace? I don't know. It's been aKyle [00:50:48]: it's, it's speedingSwyx [00:50:50]: Roughly.Kyle [00:50:50]: It's still speeding up.Swyx [00:50:51]: It's, it's April, so yeah.Kyle [00:50:51]: Exactly. This was in April.Swyx [00:50:53]: All right. So basically you have fourteen x growth, right? Year on year on year. And I think that's a scaling issue. I think, I'm going to like try to really steel man this thing. People have experienced fourteen x growth. They haven't had your downtime. And that's like— C-can we go dig into that? Why? Like what's the— what broke? What are we doing to fix it? Like just anything for the community to reassure them.Why GitHub Reliability Is Breaking in New WaysKyle [00:51:18]: so there's a Like I was saying, there's a couple different places that we've seen the growth issues. Some of the growth issues, which is why we're t— I was talking about pushing hard on more CPUs is in actions in particular. More tools, more agents, more PRs mean more builds, more builds mean more CPUs. And so we are expanding through not just our data center, but obviously we were talking about moving to Azure and moving to, adding an additional cloud compute because we simply need more CPUs. Not as much GPUs. We definitely need GPUs too, but now CPUs are becoming a factor.Swyx [00:51:53]: It's very CPU heavy.Kyle [00:51:54]: Underneath the hood when it comes to some of the underlying services, we've been breaking up over the years our database infrastructure, so that way we have, more cognitive separation between our the various services. The place that we continue to have pain is in, permissioning. And so right now m-many of our permissioning layers sit into a database that we like internally call MySQL One, and old Hubbers will know what I'm talking about. And so we've been pulling things out of MySQL One for many years, because like and we use we use Vitess and we use other technologies to shard and we do it as one bigSwyx [00:52:31]: Famous thing, PlanetScale was born from this andKyle [00:52:32]: A hundred percent. Sam Old Hubber and friend. And so finding these opportunities to like break this out and then do that globally. The other thing that I think is interesting and both a unique opportunity and tricky is we also run everything I just talked about in a black box container with GitHub Enterprise Server for people that work on-prem. So we take everything I just said, and we also do it on-prem, and we also do all of that and we do it in a data residence setup for customers that need to have their data in a single location. Each of these has the unique characteristic around how we're sort of storing that data in MySQL or in a permissioning setup. That's where some of these outages have oc-occurred, where you're seeing it more like across the board rather than just like the one pieceSwyx [00:53:17]: Filling the databaseKyle [00:53:17]: Isn't quite working. Exactly. And so part of it is that. I think there's been some other places where agents are much more or more projects appear to be moving towards monorepo versus we were going the other direction for many years in the industry. Repos were smaller, but there were more of them, and now we're seeing the opposite. Repos are bigger, and there's, not fewer of them per se ‘cause there's new growth, but, we're just seeing many more big repos. Big repos, big monorepos have always had, a unique performance problem. Because each one, is slightly different if, particularly if the underlying blobs are incredibly big Inside the repos. And so we've done a ton of work that you pro— like most people haven't probably experienced, unless you're in this case of the monorepo. But that Git, infrastructure layer improvement does help the overall, system because, many of the improvements that make monorepos work better make all repo infrastructure work better. And so, I could kind of keep going down the line where it's another thing where we're moving out of, We're changing how we do j I'll just say job queuing for lack of a better, explanation changing the underlying technologies there.Swyx [00:54:32]: I spent two years being a job queuing guy, so.Kyle [00:54:34]: And so it's kind of a little bit of a little bit of piece by piece, and it's mostly because as we were— as it was built, we built everything in a way that assumed, I guess in some ways that the size of the pipe of work was going to remain the same. There's just going to be more people coming through each of those pipes. But instead now in places whereA git push was, generally a certain size for example, is now, no longer true.Swyx [00:55:03]: Oh, yeah.Kyle [00:55:03]: OrSwyx [00:55:05]: I push a thousandKyle [00:55:06]: On the average. 100%Swyx [00:55:06]: A thousand line commits like dailyKyle [00:55:07]: Same thing with PRs. Like PRs same thing. And like we've talked about optimizing that and making changes where, and there were technology choices that did not work there? And it got slow, and it didn't It was not fast. It did not do what the users wanted. And so we've been reeling that all out and going “Okay, that's just not right. Let's stop putting good money after bad and do it the do it the right way or the right way now.” So there's It's a it's a lot of things, not quite when I've experienced scale at GitHub historically, it's almost always two options that we've used. We go vertical scaling, particularly with databases, right? And we go horizontal scaling. Oh, we just have more people using this service. Great. We're going to add more servers, and we rack them in our data center, or we use it in a cloud. And now we're sort of in a like diagonal, where like vertical doesn't really work anymore. Horizontal isn't work either because we're all We all have some CPU or GPU constraints in the world now, and now we have to go in and like crack open services that have been running for 10 or 15 years and go, “Okay, the rules of this service have legitimately changed, and now we have to rewrite them.” None of this is an excuse. This is like we're We have to do the work. We have to make it better.Swyx [00:56:22]: actually as an infra guy, I'm “This is like one of the most fascinating scaling challenges I've ever seen.”Kyle [00:56:26]: That's that's, that's the thing that's the thing that it's hard for Like when we weren't talking about it publicly, and I was like I came out, and I was “Hey, I just want to explain what's going on.” Part of it comes from a very old GitHub ethos, which is it's our it's our uptime. It's down. W What I know you're a developer, so you're, you're inclined to want to understand more what's going on. But at the same time us going “Hey, this service didn't, perform the way we expected, and now we have to go change it,” we weren't We're not trying to hide anything from you i

The top AI news from the past week, every ThursdAI
AI just cracked an 80-year-old math problem nobody could solve — plus everything from Google I/O 26

The top AI news from the past week, every ThursdAI

Play Episode Listen Later May 22, 2026 109:18


Hey, Alex here, just got back from the sunny Shoreline Theater in Mountain view, so let me catch you up! This week was definitely Google heavy, we are covering Google's IO conference for the third year in a row, and today we have a special guest, Logan Kilpatrick, is joining to discuss the announced Gemini 3.5 Flash, Google Omni model, and the new Managed Agents offerings. Plus, this week, for the first time, OpenAI announced that AI solved a Math problem that humans couldn't solve for 80 years, Cursor is showing off Composer 2.5 which is partly trained on XAI data, Karpathy joins Anthropic and much more! Let's dive in! P.S - We've announced our upcoming hackathon, Weavehacks-4, June 6-7, I'll be there, we're expecting the seats to run out very soon so register nowThursdAI - We'd love to have your subscription, and if you're already subscribed, please hit that bell on YT to never miss an episode!Google I/O 2026 - Google goes agentic everywhereI went to cover Google I/O for the third year in a row, shoutout to the DeepMind team for inviting ThursdAI again, and folks, this one felt different.Last year, Google I/O was still very model-centric. This year, the story was not “here is another benchmark chart.” The story was: Google is putting Gemini into everything, and the agentic layer is becoming the product layer. Search, Gemini app, Android, Workspace, YouTube, AI Studio, Cloud, Antigravity, Flow, managed agents, smart glasses, all of it is now orbiting around one pretty clear strategy: Gemini is the intelligence, Antigravity is the agent harness, Google's products are the distribution. I saw many reactions that were milquetoast, as in, “we expected more” and those seem to dominate the X feed. But I think the distribution is the part that many folks on X are missing. Yes, we can argue about Gemini 3.5 Flash pricing. Yes, we can argue whether “Flash” still means what Flash used to mean. But when Google says the Gemini app itself has 900 million monthly active users, before even counting Search, Gmail, YouTube, Docs, Drive, Android, and the rest of the Google surface area, that's massive! OpenAI ChatGPT is supposedly stagnated at ~900M, I don't remember them crossing a 1B. Meanwhile Google is gaining traction. And they just updated all those folks with a new model!Wolfram said it really well on the show: his mother is not sitting there reading model cards. She just uses her Pixel, voice unlocks Gemini, asks for help, and suddenly the default intelligence available to her goes up. Antigravity 2.0 - the agent harness takes center stageThe biggest strategic signal from Google I/O for me was Antigravity.Remember, Antigravity was an IDE that came from the Windsurf acquisition saga. Part of the Windsurf team went to Google, part went to Cognition, and now Google is very clearly putting Antigravity in the middle of its agentic future. And I mean very clearly. Sundar mentioned it. Demis mentioned it. Varun Mohan the co-founder was on stage immediately after them! If you've ever watched a Google I/O keynote, you know how carefully every minute is allocated. Google has YouTube, Search, Gmail, Android, Cloud, Ads, Workspace, and a thousand VP-level products that could be on stage. The fact that Antigravity was that prominent should tell you everything.Logan Kilpatrick joined us and framed this in a way I loved: Gemini became the through-line across Google products, and now the Antigravity agent harness is becoming the through-line for agentic experiences.The new Antigravity 2.0 is a complete overhaul, showing only an agentic interface (which was previously just a separate window called Agent Manager) and separating the IDE layer completely into its own app and showing a Codex like agent-first interface, which got a few folks furious. This move may be weird to some folks, but if you follow along where everyone's going, this seems to be the way of the future, coding is no longer about lines of code, it's about managing fleets of agents. The new Gemini 3.5 absolutely shines inside the new Antigravity, the model was trained with this harness in mind, and is currently offered at an incredible speed (12x), so I'm definitely going to try it! Gemini 3.5 Flash - fast, determined, and maybe not the old “Flash”The most debated model release of the week was Gemini 3.5 Flash.Some folks saw the pricing and token usage and immediately went “this is not Flash.” I get that reaction. Flash used to mean cheap, fast, lightweight chat model. But Logan's framing on the show was important: Flash is now being built for the agentic era.In a chat era, you optimize for one user message and one model answer. In an agentic era, the real token volume is in tool loops, intermediate reasoning, retries, file reads, web searches, code execution, and self-correction. That's a different product profile.Wolfram already ran Gemini 3.5 Flash through WolfBench, and the results were fascinating. With the Hermes agent harness, Gemini 3.5 Flash hit an 87% ceiling on Terminal Bench 2.0, meaning across runs it could solve more of the benchmark than even GPT-5.5 extra high in that setup. The variance was higher with the simpler Terminus harness, but with a real agent harness, the model looked much stronger.That tracks with what Nisten saw in his “Martian railgun from Olympus Mons” test. Gemini 3.5 Flash went extremely detailed, almost too determined, kept correcting itself, overcorrecting itself, and built a whole game-like simulation. Logan laughed and basically said: yeah, this model is very determined, possibly an overcorrection from the “Gemini is lazy” feedback. It also tracks with the mismatch in other benchmarks, in some, Gemini 3.5 flash shines (like the above Apex-agents from AA) and in some, it doesn't match the other frontiers. In my tests, it was definitely over-eager to use a million and a half tool calls, read tons of files, to just help me review this draft inside antigravity. It's like a super eager robotic golden retriever! Gemini Omni - Nano Banana for video, but actually more than thatThe biggest update from last year IO was Veo 3! This year, the biggest wow factor was also visual, but it wasn't VEO 4, it was a new model that is multimodal, trained end-to-end they call Omni. Google is calling this their first “create anything from anything” model, and the first version, Gemini Omni Flash, starts with conversational video editing. The easy description is: Nano Banana for video. You upload or create a video, then talk to it. Change this character. Replace this person. Add an object. Make this scene claymation. Keep the scene, but change the environment.I played with it live and showed a few examples. I asked for a claymation explainer of protein folding, then gave it my face and asked it to replace the character with me. It did it. I uploaded pictures of Sonia, my cat, and it generated a talking cat video with the right kind of cat teeth, which is weirdly important because so many pet generations accidentally add human teeth and become nightmare fuel.The failure modes are still there. I asked it to make Sonia a Russian-speaking female cat, and it only partly switched languages and didn't really change the voice. Audio upload support is also not fully productized yet, even though the underlying model is multimodal. But the direction is very clear.This is not just “Veo with a chat model glued on.” I asked Jeff Dean - Google's chief scientist about this at I/O, and he explained that Omni is trained end-to-end. The intelligence and the generative media capabilities are part of the same model family, not a hacky two-model pipeline. He also said the intelligence is around a recent Flash-level model, which is a big deal when you think about video editing as reasoning over physics, identity, scene continuity, and intent.A lot of people compared Omni to Seedance 2.0, and I think that's the wrong comparison. Seedance is amazing at cinematic generation (lkaregly due to lack of copyright concerns from Bytedance). Omni's unlock is iterative editing on real footage and coherent multi-turn creative control. Other Google IO 2026 releases I found notableThis was a concentrated effort of a huge company to insert AI into every product surface they have so of course I can't cover ALL of it here, but the most notable things for me were: * Gemini Spark - a new agentic experience from Google, to help you with tasks across Gmail, Drive and more. It should support skills, and is a de-facto OpenClaw/Hermes alternative from Google for regular folks. It's not “yet” live so we'll talk more about it when I can test it out* Managed Agents in the Gemini API - We chatted with Logan about this one, Google is re-imagining how agents are going to get built, and are offering 1 api call to spin up an agent in a full Linux env, with security and sandboxing in mind. I'll expand more on this in a next episode, as I recorded a complete conversation about this with Ali Çevic, a PM for Google APIs* AI overhaul of Google Search - AI Overviews will not expand into AI mode, and the iconic Google search box itself will change, for the first time in 25 years to include AI mode! * SynthID expantion and OpenAI collab - Google showed off that OpenAI is joining in marking all AI generate imagery and video with an invisible SynthID watermark. I think this is amazing and more companies should adopt this standard* AI Glasses! We got Google Glasses demos - Together with Warby Parker and Gentle Monster, Google finally showed off their answer to Meta Raybans/Oakleys. They look like regular glasses too, but can hear and talk to you, with the full power of Gemini multimodality. Available in the fall sometime! * Demis Hassabis “we're on the cusp of the singularity” closer - CEO and Co-Founder of DeepMind, Demis Hassabis, closed the show with his remarks about the positive future and that we are nearing this Singularity point after which the future is very uncertain. I found it to be very inspiring and closed our show with that clip as well! * Personally, I got to chat to: Demis Hassabis, have breakfast with Jeff Dean, ask Josh Woodward a bunch of questions, and pester about 20 other great folks on a live stream, and had a lot of fun! Huge thanks to the DeepMind folks, Lucie, Dimple, JD and many others for the continued belief in ThursdAI and invite me to cover this great event. OpenAI LLMs solve an 80yo math problem - Erdős Unit Distance ConjectureOutside of Google I/O, the biggest story of the week was OpenAI announcing that a general-purpose reasoning model made progress on the Erdős planar unit distance problem.This problem goes back to 1946. For nearly 80 years, mathematicians believed the best constructions looked roughly like square grids. OpenAI's model found a new family of constructions with a polynomial improvement, using algebraic number theory ideas that humans apparently had not explored in this context. The above is a representation of it! Important caveat: this does not fully solve every version of the asymptotic Erdős conjecture. Some mathematicians are pushing back on the framing, and fair enough. Precision matters. But even with the caveat, this is still a huge moment.The reason it matters is not that I personally understand the math. I absolutely do not. The reason it matters is that this was not a special-purpose IMO model fine-tuned only for math competitions. This was a general-purpose reasoning model exploring a real open problem, generating candidates, verifying them, and finding a path humans hadn't taken. Extrapolate this to other sciences, Physics for example? This means an amazing future. LDJ pointed out that mathematicians have been skeptical because there have been previous false alarms. But this one landed differently. When Fields Medalist-level mathematicians verify the proof, the discourse changes from “lol stochastic parrot” to “wait, what does this mean for my PhD?”My answer is: yes, still study math. Please study math. The mathematicians who use these tools will do much more than people who don't understand the domain. Same with software engineering. Senior engineers with Codex, Claude Code, Hermes, Antigravity, Cursor and other agents are becoming dramatically more effective because they can steer, evaluate, and recover the work.This being published a day after Demis's “foothills of the singularity” is a great conjecture. Cursor Composer 2.5 - Opus 4.7 performance model from Cursor, at 10x better efficiencyCursor dropped Composer 2.5, and folks, this is a serious release.Composer 2.5 is built on Moonshot's Kimi K2.5 base, like Composer 2, but Cursor scaled the post-training dramatically. They used 25x more synthetic tasks and introduced targeted textual feedback during RL rollouts, where the model gets hints inserted at the point of failure instead of only getting a noisy final reward.The benchmark story is strong: around 69.3 on Terminal Bench 2.0, basically neck and neck with Opus 4.7 in Cursor's chart, and strong results on SWE-bench multilingual and CursorBench. The pricing is the part that makes this especially interesting: $0.50 per million input tokens and $2.50 per million output tokens, with a faster variant at $3 / $15. That is much cheaper than the frontier models it is trying to replace for day-to-day coding work.Cursor engineers are reportedly dogfooding Composer 2.5 heavily and rarely switching away. That matters more to me than any single benchmark. If the people building Cursor can use it as a daily driver, that is a very real signal.The wild part is what comes next. Cursor is partnering with SpaceXAI to train a much larger model from scratch using 10x more compute on Colossus 2. Cursor has the workflow data. xAI has enormous compute. If this works, Cursor stops being just the IDE company and becomes a coding-model lab.We've been saying for months that coding agents are the path toward general agents. Anthropic has Claude Code. OpenAI has Codex. Google has Antigravity. xAI has Grok Build. Cursor has Composer. I'm looking forward to seeing how well it performs on our own benchmarks! Anthropic, xAI, Karpathy, and the compute warsThe compute story this week was bonkers.The SpaceX IPO filing reportedly revealed that Anthropic is paying SpaceXAI $1.25B per month for AI compute at the Memphis Colossus facility. Per month. That's about $15B a year, through May 2029, for access to more than 220,000 NVIDIA GPUs including H100s, H200s and GB200s.This is apparently inference compute for Claude Pro, Max and API users, not training. And it explains a lot of the recent quota changes. Anthropic doubled some Claude usage limits, and suddenly the product feels less constrained.Also, can we just acknowledge the comedy here? Elon Musk publicly called Anthropic “misanthropic,”, went off against every competitor to XAI, is now selling spare GPU time to Cursor and Anthropic? Who's next, OpenAI? The bigger point is that the AI capex story is no longer just NVIDIA. It's also whoever owns the data centers, power, cooling, networking, and GPU clusters. Compute is becoming the land under the AI economy.Also, Andrej Karpathy joined Anthropic. Karpathy could work anywhere. He co-founded OpenAI, led Tesla Autopilot vision, taught half the AI world how neural nets work, and now he's going back into frontier LLM R&D at Anthropic.Open source LLMs - Cohere, Qwen, NousOpen source had a strong week too.Cohere released Command A+, a 218B total parameter sparse MoE model with only 25B active parameters per token, under Apache 2.0. This is their first model that unifies reasoning, vision, multilingual, tool use and citations in one package.The hardware story is great: W4A4 quantization can run on 2 H100s or a single B200. Cohere says it supports 48 languages, 128K input context, 64K output, and gets big jumps over Command A Reasoning, including Tau-squared Bench Telecom from 37% to 85% and Terminal-Bench Hard from 3% to 25%.Cohere is one of those labs that doesn't always chase the loudest consumer hype, but they are very serious on enterprise and multilingual. Apache 2.0 makes this one especially useful.Alibaba also dropped Qwen 3.7-Max, positioned as an agentic frontier model. The headline from their testing is wild: 35 hours of continuous autonomous operation with more than 1,000 tool calls. They also showed it controlling a physical robot inside Alibaba offices and finding an umbrella after about 20 minutes of agent interaction.This digital-to-physical bridge is where things start feeling very real. An agent loop that can write code and use tools can also navigate physical tasks if you give it the right robotics stack.And our friends at Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining. At 512K context, they report a 17x faster forward+backward pass than standard attention on a single B200, and the recovered checkpoints actually beat dense-from-scratch final loss at the same token budget.The clever part is that the selection logic sits outside the attention kernel, so you still use regular FlashAttention on a gathered dense subsequence. No custom sparse kernel nonsense. If this holds up, this could matter a lot for long-context training.Tools and agentic engineering - X subscriptions, Grok Build, Codex MobileOne really practical tool update: Hermes and OpenClaw can now use your X subscription directly.This is more important than it sounds. You can connect your X Premium subscription and get access to semantic X search and Grok-related tooling without using sketchy browser automation or unofficial APIs that might get you banned. Wolfram already used this to have his agent go through his likes and bookmarks from the past week and send me news items for the show. That is exactly the kind of “small but real” agent workflow that becomes addictive.xAI also launched Grok Build, their agentic CLI coding tool, in early beta for SuperGrok Heavy subscribers. Early users are already running parallel Grok Build agents through tmux supervisors and using it for more than coding: fleet data triage, security patching, training label work, and general automation.The pricing being discussed is aggressive, around $1 per million input tokens and $2 per million output tokens for the API. The model version is grok-build-0.1, and folks have already wired it into Hermes with a 256K context window.And then there's Codex Mobile, which OpenAI shipped inside the ChatGPT mobile apps. This is one of those releases that sounds small until you start using it. You can control Codex sessions remotely from your phone, connected to your machine, and because Codex has native connectors to Gmail, Calendar and other surfaces, it sometimes feels faster and more reliable than local CLIs duct-taped to third-party integrations.I ported Wolfred into Codex with skills and everything, and I've been comparing the same tasks in Hermes and Codex. Codex is often faster, not necessarily because the model is always smarter, but because the connectors and harness are cleaner. Harness matters. We keep coming back to this.This Week's Buzz - W&B, CoreWeave, WolfBench and roboticsThis week in the Buzz, Wolfram walked us through a few things from the Weights & Biases / CoreWeave world.CoreWeave is a gold sponsor at ICRA 2026 in Vienna, the International Conference on Robotics and Automation. NVIDIA is also going big there with a keynote on generalist humanoid robots, 17 accepted papers and workshops around sim-to-real, robot foundation models, autonomous driving, manipulation, and physical AI.Wolfram will be there later in the week, after speaking at the AI Developer event in Cologne about WolfBench. If you're in Europe and into robotics or agent evals, find him.We also looked at WolfBench results for Gemini 3.5 Flash, which honestly became one of the more interesting empirical points of the episode. The model looks variable in simple harnesses, but very capable in better agent loops. That's the whole thesis of measuring model + harness together instead of pretending the model card tells the whole story.The water discourse, almonds, and data center realityWe also got into the data center water discourse, because this talking point is everywhere right now.There are real infrastructure questions around AI. Power, land, cooling, grid capacity, permitting, local impact, all of that matters. But the “AI is stealing drinking water” version of the argument is often wildly detached from scale.The stat I brought up on the show: California almonds use roughly 3 to 5.5 million acre-feet of water per year, multiple times more than all North American data centers combined in 2025. Nisten and LDJ added the important cooling nuance: many large data centers use closed-loop cooling, and evaporative cooling is not universal. Some data centers can avoid water use almost entirely, but at the cost of higher electricity usage.This doesn't mean “no concerns are valid.” It means if we're going to regulate or pause data centers, let's be honest about the actual tradeoffs. AI compute is becoming the substrate for medicine, robotics, science, logistics, software, education and every other productivity layer. We should build responsibly, but not based on viral fear math.Closing thoughts - foothills of the singularityDemis closed I/O saying we're in the foothills of the singularity, and I know how that lands when you write it down. But I was in the room, and after the keynote he told me something I haven't been able to shake: he thinks AI is going to be 10x as impactful as the Industrial Revolution, and 10x as fast. Basically 100x. This is the AlphaFold guy. Not someone loose with his words.Then look at the week. A general reasoner cracked an 80-year-old math problem. Cursor is training near-frontier coding models on a fraction of the big-lab budget. Anthropic is paying Elon $15B a year for inference. Karpathy left education to go back into pre-training. Google rolled out an intelligence uplift to a billion people who don't even know a model dropped.If you put that on a whiteboard in 2023, it reads like a sci-fi pitch.LDJ's mathematician friends are asking if they should keep doing their PhDs. My answer hasn't changed: yes, please keep going. The people who combine domain taste with these tools are going to ship more in 5 years than the previous generation did in 50. The tool doesn't replace the taste. It just removes the bottleneck.That's the whole reason ThursdAI exists. Not to hype every drop, not to dunk for engagement, but to give you a shot at being one of the people who knows what's happening, with the receipts.This week, a lot changed.See you next Thursday.TL;DR and Show Notes* Hosts and Guests* Alex Volkov - AI Evangelist at Weights & Biases / CoreWeave, @altryne* Co-hosts: @WolframRvnwlf, @nisten, @ldjconfirmed* Guest: Logan Kilpatrick, MTS at Google DeepMind / AI Studio, @OfficialLoganK* Google I/O 2026* Google went all-in on agents across Search, Gemini, Antigravity, Workspace, Android, Cloud and YouTube (I/O site, Alex thread)* Antigravity 2.0 became the central agentic coding harness across Google (Sundar, Google OS demo)* Gemini 3.5 Flash launched as a fast, determined workhorse model for agentic loops (Logan, Noam Shazeer, Jeff Dean)* Gemini 3.5 Flash is rolling out across the Gemini app, Search AI Mode, Gemini API, Google AI Studio, Antigravity and Gemini Enterprise Agent Platform (Koray Kavukcuoglu)* Google Search is getting new Gemini 3.5 Flash-powered agentic capabilities, including a new AI-powered Search box and background information agents (Sundar)* Gemini Spark was announced as a 24/7 personal AI agent that can proactively work across Google surfaces (News from Google)* Google teased Gemini-powered Android XR smart glasses with eyewear partners Gentle Monster and Warby Parker (Google, Alex live reaction)* Google AI Studio and the Gemini API got major agentic developer updates, including Managed Agents (Google AI Developers)* Vision & Video* Google DeepMind launched Gemini Omni, a “create anything from anything” multimodal model starting with conversational video editing (DeepMind, Google DeepMind on X)* Omni is available in the Gemini app, Google Flow and YouTube, with API support coming soon (Logan, Gemini App, Sundar)* Key distinction: Omni is not just text-to-video, it is an iterative multi-turn video editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media (Google)* Big CO LLMs + APIs* OpenAI announced a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief (OpenAI, X)* Cursor launched Composer 2.5, built on Kimi K2.5, with Opus-class coding performance at much lower cost (Cursor blog, X)* Alibaba released Qwen 3.7-Max, an agentic frontier model with long autonomous runs and robotics demos (Qwen blog, X, robot demo)* Andrej Karpathy joined Anthropic to work on frontier LLM R&D (X)* SpaceX IPO filing revealed Anthropic is paying $1.25B/month for AI compute at the Memphis Colossus facility (Axios, Sawyer Merritt)* The jury in Musk v. Altman found Musk's OpenAI claims barred by statute of limitations, with Musk saying he will appeal (Elon Musk, Sawyer Merritt, Max Zeff)* Open Source LLMs* Cohere released Command A+, a 218B MoE model with 25B active parameters under Apache 2.0 (Cohere, Nick Frosst, HF W4A4, HF BF16)* Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining with major speedups (Blog, X, arXiv, GitHub)* Tools & Agentic Engineering* Google launched Managed Agents in the Gemini API, letting developers spin up hosted Antigravity agents with Linux sandboxes and persistent state (Docs, X)* xAI launched Grok Build, an agentic CLI coding tool in beta for SuperGrok Heavy users (xAI CLI, X)* Hermes and OpenClaw can now use X subscription auth for semantic search and Grok tooling (Alex)* OpenAI Codex Mobile is now available in the ChatGPT mobile apps for remote agent workflows (OpenAI)* Anthropic doubled Claude usage outside peak hours for a limited period, including Claude Code and other Claude surfaces (Claude)* This Week's Buzz - W&B / CoreWeave* Weights & Biases by CoreWeave is at ICRA 2026 in Vienna, with robotics and automation taking center stage (ICRA, W&B event page)* NVIDIA heads to ICRA 2026 with robotics work around generalist humanoids, physical AI and sim-to-real systems (NVIDIA Robotics, NVIDIA ICRA)* Wolfram is speaking about WolfBench at the AI Developer event in Cologne before heading to ICRA in Vienna (Wolfram)* Other Topics* Data center water usage discourse came up again, including why comparisons need real scale and context rather than viral fear math* The broader theme of the week: coding agents are becoming general agents, and the major labs are now competing on the full stack of model, harness, tools, context and compute This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

Off The Charts Football Podcast
2026 NFL Draft Recap

Off The Charts Football Podcast

Play Episode Listen Later Apr 28, 2026 49:30


The 2026 NFL Draft has come to a conclusion and we have you covered on analyzing the results! James Weaver and Alex Vigderman from the R&D department are joined by Ben Hrkach, Jeff Dean, and Jordan Edwards from our Scouting Department to look at what teams gained the most value, what teams left the most on the table, what players found a dream home, and a lot more!- Rueben Bain Jr.'s perfect fit in Tampa- The Chiefs capitalized on drafting value- Jacksonville ends up with a perplexing draft board- Was the Ty Simpson selection warranted?Visit our NFL Draft Website. Off The Charts features a blend of statistical insights, tactical analysis, and personal opinions, aimed at providing listeners with a comprehensive understanding of the week's key matchups and the intricacies of the sport. You can follow our content on Twitter at @Football_SIS, on Bluesky at @sportsinfosis.bsky.social and at sportsinfosolutions.com.

Off The Charts Football Podcast
2026 NFL Draft Preview: Tackles and Edge Rushers

Off The Charts Football Podcast

Play Episode Listen Later Apr 9, 2026 53:23


Our NFL Draft preview continues as James Weaver and Alex Vigderman from the R&D department are joined by two members of our Draft team, Jeff Dean and Ryan Rubenstein, who work in our operations department.This week's episode focused on protecting the quarterback and getting to the quarterback, with scouting and statistical observations on offensive linemen Francis Mauigoa and Kadyn Proctor, along with Edge rushers David Bailey, Keldric Faulk, Derrick Moore, and Gabe Jacas with a couple of other additional favorites of our scouts thrown into the mix.Visit our NFL Draft Website. Off The Charts features a blend of statistical insights, tactical analysis, and personal opinions, aimed at providing listeners with a comprehensive understanding of the week's key matchups and the intricacies of the sport. You can follow our content on Twitter at @Football_SIS, on Bluesky at @sportsinfosis.bsky.social and at sportsinfosolutions.com.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
[FULL RUNDOWN] Yann LeCun's $1B World Models, the Industry-Wide Pentagon Lawsuit, and the $599 MacBook Delay (March 10th Rundown)

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Play Episode Listen Later Mar 10, 2026 21:25


Listen Ads-FREE at DjamgaMind: https://podcasts.apple.com/us/podcast/daily-news-rundown-the-fire-and-forget-office/id1864721054?i=1000754393738

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

From rewriting Google's search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code.Jeff joins us to unpack what it really means to “own the Pareto frontier,” why distillation is the engine behind every Flash model breakthrough, how energy (in picojoules) not FLOPs is becoming the true bottleneck, what it was like leading the charge to unify all of Google's AI teams, and why the next leap won't come from bigger context windows alone, but from systems that give the illusion of attending to trillions of tokens.We discuss:* Jeff's early neural net thesis in 1990: parallel training before it was cool, why he believed scaling would win decades early, and the “bigger model, more data, better results” mantra that held for 15 years* The evolution of Google Search: sharding, moving the entire index into memory in 2001, softening query semantics pre-LLMs, and why retrieval pipelines already resemble modern LLM systems* Pareto frontier strategy: why you need both frontier “Pro” models and low-latency “Flash” models, and how distillation lets smaller models surpass prior generations* Distillation deep dive: ensembles → compression → logits as soft supervision, and why you need the biggest model to make the smallest one good* Latency as a first-class objective: why 10–50x lower latency changes UX entirely, and how future reasoning workloads will demand 10,000 tokens/sec* Energy-based thinking: picojoules per bit, why moving data costs 1000x more than a multiply, batching through the lens of energy, and speculative decoding as amortization* TPU co-design: predicting ML workloads 2–6 years out, speculative hardware features, precision reduction, sparsity, and the constant feedback loop between model architecture and silicon* Sparse models and “outrageously large” networks: trillions of parameters with 1–5% activation, and why sparsity was always the right abstraction* Unified vs. specialized models: abandoning symbolic systems, why general multimodal models tend to dominate vertical silos, and when vertical fine-tuning still makes sense* Long context and the illusion of scale: beyond needle-in-a-haystack benchmarks toward systems that narrow trillions of tokens to 117 relevant documents* Personalized AI: attending to your emails, photos, and documents (with permission), and why retrieval + reasoning will unlock deeply personal assistants* Coding agents: 50 AI interns, crisp specifications as a new core skill, and how ultra-low latency will reshape human–agent collaboration* Why ideas still matter: transformers, sparsity, RL, hardware, systems — scaling wasn't blind; the pieces had to multiply togetherShow Notes:* Gemma 3 Paper* Gemma 3* Gemini 2.5 Report* Jeff Dean's “Software Engineering Advice fromBuilding Large-Scale Distributed Systems” Presentation (with Back of the Envelope Calculations)* Latency Numbers Every Programmer Should Know by Jeff Dean* The Jeff Dean Facts* Jeff Dean Google Bio* Jeff Dean on “Important AI Trends” @Stanford AI Club* Jeff Dean & Noam Shazeer — 25 years at Google (Dwarkesh)—Jeff Dean* LinkedIn: https://www.linkedin.com/in/jeff-dean-8b212555* X: https://x.com/jeffdeanGoogle* https://google.com* https://deepmind.googleFull Video EpisodeTimestamps00:00:04 — Introduction: Alessio & Swyx welcome Jeff Dean, chief AI scientist at Google, to the Latent Space podcast00:00:30 — Owning the Pareto Frontier & balancing frontier vs low-latency models00:01:31 — Frontier models vs Flash models + role of distillation00:03:52 — History of distillation and its original motivation00:05:09 — Distillation's role in modern model scaling00:07:02 — Model hierarchy (Flash, Pro, Ultra) and distillation sources00:07:46 — Flash model economics & wide deployment00:08:10 — Latency importance for complex tasks00:09:19 — Saturation of some tasks and future frontier tasks00:11:26 — On benchmarks, public vs internal00:12:53 — Example long-context benchmarks & limitations00:15:01 — Long-context goals: attending to trillions of tokens00:16:26 — Realistic use cases beyond pure language00:18:04 — Multimodal reasoning and non-text modalities00:19:05 — Importance of vision & motion modalities00:20:11 — Video understanding example (extracting structured info)00:20:47 — Search ranking analogy for LLM retrieval00:23:08 — LLM representations vs keyword search00:24:06 — Early Google search evolution & in-memory index00:26:47 — Design principles for scalable systems00:28:55 — Real-time index updates & recrawl strategies00:30:06 — Classic “Latency numbers every programmer should know”00:32:09 — Cost of memory vs compute and energy emphasis00:34:33 — TPUs & hardware trade-offs for serving models00:35:57 — TPU design decisions & co-design with ML00:38:06 — Adapting model architecture to hardware00:39:50 — Alternatives: energy-based models, speculative decoding00:42:21 — Open research directions: complex workflows, RL00:44:56 — Non-verifiable RL domains & model evaluation00:46:13 — Transition away from symbolic systems toward unified LLMs00:47:59 — Unified models vs specialized ones00:50:38 — Knowledge vs reasoning & retrieval + reasoning00:52:24 — Vertical model specialization & modules00:55:21 — Token count considerations for vertical domains00:56:09 — Low resource languages & contextual learning00:59:22 — Origins: Dean's early neural network work01:10:07 — AI for coding & human–model interaction styles01:15:52 — Importance of crisp specification for coding agents01:19:23 — Prediction: personalized models & state retrieval01:22:36 — Token-per-second targets (10k+) and reasoning throughput01:23:20 — Episode conclusion and thanksTranscriptAlessio Fanelli [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space. Shawn Wang [00:00:11]: Hello, hello. We're here in the studio with Jeff Dean, chief AI scientist at Google. Welcome. Thanks for having me. It's a bit surreal to have you in the studio. I've watched so many of your talks, and obviously your career has been super legendary. So, I mean, congrats. I think the first thing must be said, congrats on owning the Pareto Frontier.Jeff Dean [00:00:30]: Thank you, thank you. Pareto Frontiers are good. It's good to be out there.Shawn Wang [00:00:34]: Yeah, I mean, I think it's a combination of both. You have to own the Pareto Frontier. You have to have like frontier capability, but also efficiency, and then offer that range of models that people like to use. And, you know, some part of this was started because of your hardware work. Some part of that is your model work, and I'm sure there's lots of secret sauce that you guys have worked on cumulatively. But, like, it's really impressive to see it all come together in, like, this slittily advanced.Jeff Dean [00:01:04]: Yeah, yeah. I mean, I think, as you say, it's not just one thing. It's like a whole bunch of things up and down the stack. And, you know, all of those really combine to help make UNOS able to make highly capable large models, as well as, you know, software techniques to get those large model capabilities into much smaller, lighter weight models that are, you know, much more cost effective and lower latency, but still, you know, quite capable for their size. Yeah.Alessio Fanelli [00:01:31]: How much pressure do you have on, like, having the lower bound of the Pareto Frontier, too? I think, like, the new labs are always trying to push the top performance frontier because they need to raise more money and all of that. And you guys have billions of users. And I think initially when you worked on the CPU, you were thinking about, you know, if everybody that used Google, we use the voice model for, like, three minutes a day, they were like, you need to double your CPU number. Like, what's that discussion today at Google? Like, how do you prioritize frontier versus, like, we have to do this? How do we actually need to deploy it if we build it?Jeff Dean [00:02:03]: Yeah, I mean, I think we always want to have models that are at the frontier or pushing the frontier because I think that's where you see what capabilities now exist that didn't exist at the sort of slightly less capable last year's version or last six months ago version. At the same time, you know, we know those are going to be really useful for a bunch of use cases, but they're going to be a bit slower and a bit more expensive than people might like for a bunch of other broader models. So I think what we want to do is always have kind of a highly capable sort of affordable model that enables a whole bunch of, you know, lower latency use cases. People can use them for agentic coding much more readily and then have the high-end, you know, frontier model that is really useful for, you know, deep reasoning, you know, solving really complicated math problems, those kinds of things. And it's not that. One or the other is useful. They're both useful. So I think we'd like to do both. And also, you know, through distillation, which is a key technique for making the smaller models more capable, you know, you have to have the frontier model in order to then distill it into your smaller model. So it's not like an either or choice. You sort of need that in order to actually get a highly capable, more modest size model. Yeah.Alessio Fanelli [00:03:24]: I mean, you and Jeffrey came up with the solution in 2014.Jeff Dean [00:03:28]: Don't forget, L'Oreal Vinyls as well. Yeah, yeah.Alessio Fanelli [00:03:30]: A long time ago. But like, I'm curious how you think about the cycle of these ideas, even like, you know, sparse models and, you know, how do you reevaluate them? How do you think about in the next generation of model, what is worth revisiting? Like, yeah, they're just kind of like, you know, you worked on so many ideas that end up being influential, but like in the moment, they might not feel that way necessarily. Yeah.Jeff Dean [00:03:52]: I mean, I think distillation was originally motivated because we were seeing that we had a very large image data set at the time, you know, 300 million images that we could train on. And we were seeing that if you create specialists for different subsets of those image categories, you know, this one's going to be really good at sort of mammals, and this one's going to be really good at sort of indoor room scenes or whatever, and you can cluster those categories and train on an enriched stream of data after you do pre-training on a much broader set of images. You get much better performance. If you then treat that whole set of maybe 50 models you've trained as a large ensemble, but that's not a very practical thing to serve, right? So distillation really came about from the idea of, okay, what if we want to actually serve that and train all these independent sort of expert models and then squish it into something that actually fits in a form factor that you can actually serve? And that's, you know, not that different from what we're doing today. You know, often today we're instead of having an ensemble of 50 models. We're having a much larger scale model that we then distill into a much smaller scale model.Shawn Wang [00:05:09]: Yeah. A part of me also wonders if distillation also has a story with the RL revolution. So let me maybe try to articulate what I mean by that, which is you can, RL basically spikes models in a certain part of the distribution. And then you have to sort of, well, you can spike models, but usually sometimes... It might be lossy in other areas and it's kind of like an uneven technique, but you can probably distill it back and you can, I think that the sort of general dream is to be able to advance capabilities without regressing on anything else. And I think like that, that whole capability merging without loss, I feel like it's like, you know, some part of that should be a distillation process, but I can't quite articulate it. I haven't seen much papers about it.Jeff Dean [00:06:01]: Yeah, I mean, I tend to think of one of the key advantages of distillation is that you can have a much smaller model and you can have a very large, you know, training data set and you can get utility out of making many passes over that data set because you're now getting the logits from the much larger model in order to sort of coax the right behavior out of the smaller model that you wouldn't otherwise get with just the hard labels. And so, you know, I think that's what we've observed. Is you can get, you know, very close to your largest model performance with distillation approaches. And that seems to be, you know, a nice sweet spot for a lot of people because it enables us to kind of, for multiple Gemini generations now, we've been able to make the sort of flash version of the next generation as good or even substantially better than the previous generations pro. And I think we're going to keep trying to do that because that seems like a good trend to follow.Shawn Wang [00:07:02]: So, Dara asked, so it was the original map was Flash Pro and Ultra. Are you just sitting on Ultra and distilling from that? Is that like the mother load?Jeff Dean [00:07:12]: I mean, we have a lot of different kinds of models. Some are internal ones that are not necessarily meant to be released or served. Some are, you know, our pro scale model and we can distill from that as well into our Flash scale model. So I think, you know, it's an important set of capabilities to have and also inference time scaling. It can also be a useful thing to improve the capabilities of the model.Shawn Wang [00:07:35]: And yeah, yeah, cool. Yeah. And obviously, I think the economy of Flash is what led to the total dominance. I think the latest number is like 50 trillion tokens. I don't know. I mean, obviously, it's changing every day.Jeff Dean [00:07:46]: Yeah, yeah. But, you know, by market share, hopefully up.Shawn Wang [00:07:50]: No, I mean, there's no I mean, there's just the economics wise, like because Flash is so economical, like you can use it for everything. Like it's in Gmail now. It's in YouTube. Like it's yeah. It's in everything.Jeff Dean [00:08:02]: We're using it more in our search products of various AI mode reviews.Shawn Wang [00:08:05]: Oh, my God. Flash past the AI mode. Oh, my God. Yeah, that's yeah, I didn't even think about that.Jeff Dean [00:08:10]: I mean, I think one of the things that is quite nice about the Flash model is not only is it more affordable, it's also a lower latency. And I think latency is actually a pretty important characteristic for these models because we're going to want models to do much more complicated things that are going to involve, you know, generating many more tokens from when you ask the model to do so. So, you know, if you're going to ask the model to do something until it actually finishes what you ask it to do, because you're going to ask now, not just write me a for loop, but like write me a whole software package to do X or Y or Z. And so having low latency systems that can do that seems really important. And Flash is one direction, one way of doing that. You know, obviously our hardware platforms enable a bunch of interesting aspects of our, you know, serving stack as well, like TPUs, the interconnect between. Chips on the TPUs is actually quite, quite high performance and quite amenable to, for example, long context kind of attention operations, you know, having sparse models with lots of experts. These kinds of things really, really matter a lot in terms of how do you make them servable at scale.Alessio Fanelli [00:09:19]: Yeah. Does it feel like there's some breaking point for like the proto Flash distillation, kind of like one generation delayed? I almost think about almost like the capability as a. In certain tasks, like the pro model today is a saturated, some sort of task. So next generation, that same task will be saturated at the Flash price point. And I think for most of the things that people use models for at some point, the Flash model in two generation will be able to do basically everything. And how do you make it economical to like keep pushing the pro frontier when a lot of the population will be okay with the Flash model? I'm curious how you think about that.Jeff Dean [00:09:59]: I mean, I think that's true. If your distribution of what people are asking people, the models to do is stationary, right? But I think what often happens is as the models become more capable, people ask them to do more, right? So, I mean, I think this happens in my own usage. Like I used to try our models a year ago for some sort of coding task, and it was okay at some simpler things, but wouldn't do work very well for more complicated things. And since then, we've improved dramatically on the more complicated coding tasks. And now I'll ask it to do much more complicated things. And I think that's true, not just of coding, but of, you know, now, you know, can you analyze all the, you know, renewable energy deployments in the world and give me a report on solar panel deployment or whatever. That's a very complicated, you know, more complicated task than people would have asked a year ago. And so you are going to want more capable models to push the frontier in the absence of what people ask the models to do. And that also then gives us. Insight into, okay, where does the, where do things break down? How can we improve the model in these, these particular areas, uh, in order to sort of, um, make the next generation even better.Alessio Fanelli [00:11:11]: Yeah. Are there any benchmarks or like test sets they use internally? Because it's almost like the same benchmarks get reported every time. And it's like, all right, it's like 99 instead of 97. Like, how do you have to keep pushing the team internally to it? Or like, this is what we're building towards. Yeah.Jeff Dean [00:11:26]: I mean, I think. Benchmarks, particularly external ones that are publicly available. Have their utility, but they often kind of have a lifespan of utility where they're introduced and maybe they're quite hard for current models. You know, I, I like to think of the best kinds of benchmarks are ones where the initial scores are like 10 to 20 or 30%, maybe, but not higher. And then you can sort of work on improving that capability for, uh, whatever it is, the benchmark is trying to assess and get it up to like 80, 90%, whatever. I, I think once it hits kind of 95% or something, you get very diminishing returns from really focusing on that benchmark, cuz it's sort of, it's either the case that you've now achieved that capability, or there's also the issue of leakage in public data or very related kind of data being, being in your training data. Um, so we have a bunch of held out internal benchmarks that we really look at where we know that wasn't represented in the training data at all. There are capabilities that we want the model to have. Um, yeah. Yeah. Um, that it doesn't have now, and then we can work on, you know, assessing, you know, how do we make the model better at these kinds of things? Is it, we need different kind of data to train on that's more specialized for this particular kind of task. Do we need, um, you know, a bunch of, uh, you know, architectural improvements or some sort of, uh, model capability improvements, you know, what would help make that better?Shawn Wang [00:12:53]: Is there, is there such an example that you, uh, a benchmark inspired in architectural improvement? Like, uh, I'm just kind of. Jumping on that because you just.Jeff Dean [00:13:02]: Uh, I mean, I think some of the long context capability of the, of the Gemini models that came, I guess, first in 1.5 really were about looking at, okay, we want to have, um, you know,Shawn Wang [00:13:15]: immediately everyone jumped to like completely green charts of like, everyone had, I was like, how did everyone crack this at the same time? Right. Yeah. Yeah.Jeff Dean [00:13:23]: I mean, I think, um, and once you're set, I mean, as you say that needed single needle and a half. Hey, stack benchmark is really saturated for at least context links up to 1, 2 and K or something. Don't actually have, you know, much larger than 1, 2 and 8 K these days or two or something. We're trying to push the frontier of 1 million or 2 million context, which is good because I think there are a lot of use cases where. Yeah. You know, putting a thousand pages of text or putting, you know, multiple hour long videos and the context and then actually being able to make use of that as useful. Try to, to explore the über graduation are fairly large. But the single needle in a haystack benchmark is sort of saturated. So you really want more complicated, sort of multi-needle or more realistic, take all this content and produce this kind of answer from a long context that sort of better assesses what it is people really want to do with long context. Which is not just, you know, can you tell me the product number for this particular thing?Shawn Wang [00:14:31]: Yeah, it's retrieval. It's retrieval within machine learning. It's interesting because I think the more meta level I'm trying to operate at here is you have a benchmark. You're like, okay, I see the architectural thing I need to do in order to go fix that. But should you do it? Because sometimes that's an inductive bias, basically. It's what Jason Wei, who used to work at Google, would say. Exactly the kind of thing. Yeah, you're going to win. Short term. Longer term, I don't know if that's going to scale. You might have to undo that.Jeff Dean [00:15:01]: I mean, I like to sort of not focus on exactly what solution we're going to derive, but what capability would you want? And I think we're very convinced that, you know, long context is useful, but it's way too short today. Right? Like, I think what you would really want is, can I attend to the internet while I answer my question? Right? But that's not going to happen. I think that's going to be solved by purely scaling the existing solutions, which are quadratic. So a million tokens kind of pushes what you can do. You're not going to do that to a trillion tokens, let alone, you know, a billion tokens, let alone a trillion. But I think if you could give the illusion that you can attend to trillions of tokens, that would be amazing. You'd find all kinds of uses for that. You would have attend to the internet. You could attend to the pixels of YouTube and the sort of deeper representations that we can find. You could attend to the form for a single video, but across many videos, you know, on a personal Gemini level, you could attend to all of your personal state with your permission. So like your emails, your photos, your docs, your plane tickets you have. I think that would be really, really useful. And the question is, how do you get algorithmic improvements and system level improvements that get you to something where you actually can attend to trillions of tokens? Right. In a meaningful way. Yeah.Shawn Wang [00:16:26]: But by the way, I think I did some math and it's like, if you spoke all day, every day for eight hours a day, you only generate a maximum of like a hundred K tokens, which like very comfortably fits.Jeff Dean [00:16:38]: Right. But if you then say, okay, I want to be able to understand everything people are putting on videos.Shawn Wang [00:16:46]: Well, also, I think that the classic example is you start going beyond language into like proteins and whatever else is extremely information dense. Yeah. Yeah.Jeff Dean [00:16:55]: I mean, I think one of the things about Gemini's multimodal aspects is we've always wanted it to be multimodal from the start. And so, you know, that sometimes to people means text and images and video sort of human-like and audio, audio, human-like modalities. But I think it's also really useful to have Gemini know about non-human modalities. Yeah. Like LIDAR sensor data from. Yes. Say, Waymo vehicles or. Like robots or, you know, various kinds of health modalities, x-rays and MRIs and imaging and genomics information. And I think there's probably hundreds of modalities of data where you'd like the model to be able to at least be exposed to the fact that this is an interesting modality and has certain meaning in the world. Where even if you haven't trained on all the LIDAR data or MRI data, you could have, because maybe that's not, you know, it doesn't make sense in terms of trade-offs of. You know, what you include in your main pre-training data mix, at least including a little bit of it is actually quite useful. Yeah. Because it sort of tempts the model that this is a thing.Shawn Wang [00:18:04]: Yeah. Do you believe, I mean, since we're on this topic and something I just get to ask you all the questions I always wanted to ask, which is fantastic. Like, are there some king modalities, like modalities that supersede all the other modalities? So a simple example was Vision can, on a pixel level, encode text. And DeepSeq had this DeepSeq CR paper that did that. Vision. And Vision has also been shown to maybe incorporate audio because you can do audio spectrograms and that's, that's also like a Vision capable thing. Like, so, so maybe Vision is just the king modality and like. Yeah.Jeff Dean [00:18:36]: I mean, Vision and Motion are quite important things, right? Motion. Well, like video as opposed to static images, because I mean, there's a reason evolution has evolved eyes like 23 independent ways, because it's such a useful capability for sensing the world around you, which is really what we want these models to be. So I think the only thing that we can be able to do is interpret the things we're seeing or the things we're paying attention to and then help us in using that information to do things. Yeah.Shawn Wang [00:19:05]: I think motion, you know, I still want to shout out, I think Gemini, still the only native video understanding model that's out there. So I use it for YouTube all the time. Nice.Jeff Dean [00:19:15]: Yeah. Yeah. I mean, it's actually, I think people kind of are not necessarily aware of what the Gemini models can actually do. Yeah. Like I have an example I've used in one of my talks. It had like, it was like a YouTube highlight video of 18 memorable sports moments across the last 20 years or something. So it has like Michael Jordan hitting some jump shot at the end of the finals and, you know, some soccer goals and things like that. And you can literally just give it the video and say, can you please make me a table of what all these different events are? What when the date is when they happened? And a short description. And so you get like now an 18 row table of that information extracted from the video, which is, you know, not something most people think of as like a turn video into sequel like table.Alessio Fanelli [00:20:11]: Has there been any discussion inside of Google of like, you mentioned tending to the whole internet, right? Google, it's almost built because a human cannot tend to the whole internet and you need some sort of ranking to find what you need. Yep. That ranking is like much different for an LLM because you can expect a person to look at maybe the first five, six links in a Google search versus for an LLM. Should you expect to have 20 links that are highly relevant? Like how do you internally figure out, you know, how do we build the AI mode that is like maybe like much broader search and span versus like the more human one? Yeah.Jeff Dean [00:20:47]: I mean, I think even pre-language model based work, you know, our ranking systems would be built to start. I mean, I think even pre-language model based work, you know, our ranking systems would be built to start. With a giant number of web pages in our index, many of them are not relevant. So you identify a subset of them that are relevant with very lightweight kinds of methods. You know, you're down to like 30,000 documents or something. And then you gradually refine that to apply more and more sophisticated algorithms and more and more sophisticated sort of signals of various kinds in order to get down to ultimately what you show, which is, you know, the final 10 results or, you know, 10 results plus. Other kinds of information. And I think an LLM based system is not going to be that dissimilar, right? You're going to attend to trillions of tokens, but you're going to want to identify, you know, what are the 30,000 ish documents that are with the, you know, maybe 30 million interesting tokens. And then how do you go from that into what are the 117 documents I really should be paying attention to in order to carry out the tasks that the user has asked? And I think, you know, you can imagine systems where you have, you know, a lot of highly parallel processing to identify those initial 30,000 candidates, maybe with very lightweight kinds of models. Then you have some system that sort of helps you narrow down from 30,000 to the 117 with maybe a little bit more sophisticated model or set of models. And then maybe the final model is the thing that looks. So the 117 things that might be your most capable model. So I think it has to, it's going to be some system like that, that is really enables you to give the illusion of attending to trillions of tokens. Sort of the way Google search gives you, you know, not the illusion, but you are searching the internet, but you're finding, you know, a very small subset of things that are, that are relevant.Shawn Wang [00:22:47]: Yeah. I often tell a lot of people that are not steeped in like Google search history that, well, you know, like Bert was. Like he was like basically immediately inside of Google search and that improves results a lot, right? Like I don't, I don't have any numbers off the top of my head, but like, I'm sure you guys, that's obviously the most important numbers to Google. Yeah.Jeff Dean [00:23:08]: I mean, I think going to an LLM based representation of text and words and so on enables you to get out of the explicit hard notion of, of particular words having to be on the page, but really getting at the notion of this topic of this page or this page. Paragraph is highly relevant to this query. Yeah.Shawn Wang [00:23:28]: I don't think people understand how much LLMs have taken over all these very high traffic system, very high traffic. Yeah. Like it's Google, it's YouTube. YouTube has this like semantics ID thing where it's just like every token or every item in the vocab is a YouTube video or something that predicts the video using a code book, which is absurd to me for YouTube size.Jeff Dean [00:23:50]: And then most recently GROK also for, for XAI, which is like, yeah. I mean, I'll call out even before LLMs were used extensively in search, we put a lot of emphasis on softening the notion of what the user actually entered into the query.Shawn Wang [00:24:06]: So do you have like a history of like, what's the progression? Oh yeah.Jeff Dean [00:24:09]: I mean, I actually gave a talk in, uh, I guess, uh, web search and data mining conference in 2009, uh, where we never actually published any papers about the origins of Google search, uh, sort of, but we went through sort of four or five or six. generations, four or five or six generations of, uh, redesigning of the search and retrieval system, uh, from about 1999 through 2004 or five. And that talk is really about that evolution. And one of the things that really happened in 2001 was we were sort of working to scale the system in multiple dimensions. So one is we wanted to make our index bigger, so we could retrieve from a larger index, which always helps your quality in general. Uh, because if you don't have the page in your index, you're going to not do well. Um, and then we also needed to scale our capacity because we were, our traffic was growing quite extensively. Um, and so we had, you know, a sharded system where you have more and more shards as the index grows, you have like 30 shards. And then if you want to double the index size, you make 60 shards so that you can bound the latency by which you respond for any particular user query. Um, and then as traffic grows, you add, you add more and more replicas of each of those. And so we eventually did the math that realized that in a data center where we had say 60 shards and, um, you know, 20 copies of each shard, we now had 1200 machines, uh, with disks. And we did the math and we're like, Hey, one copy of that index would actually fit in memory across 1200 machines. So in 2001, we introduced, uh, we put our entire index in memory and what that enabled from a quality perspective was amazing. Um, and so we had more and more replicas of each of those. Before you had to be really careful about, you know, how many different terms you looked at for a query, because every one of them would involve a disk seek on every one of the 60 shards. And so you, as you make your index bigger, that becomes even more inefficient. But once you have the whole index in memory, it's totally fine to have 50 terms you throw into the query from the user's original three or four word query, because now you can add synonyms like restaurant and restaurants and cafe and, uh, you know, things like that. Uh, bistro and all these things. And you can suddenly start, uh, sort of really, uh, getting at the meaning of the word as opposed to the exact semantic form the user typed in. And that was, you know, 2001, very much pre LLM, but really it was about softening the, the strict definition of what the user typed in order to get at the meaning.Alessio Fanelli [00:26:47]: What are like principles that you use to like design the systems, especially when you have, I mean, in 2001, the internet is like. Doubling, tripling every year in size is not like, uh, you know, and I think today you kind of see that with LLMs too, where like every year the jumps in size and like capabilities are just so big. Are there just any, you know, principles that you use to like, think about this? Yeah.Jeff Dean [00:27:08]: I mean, I think, uh, you know, first, whenever you're designing a system, you want to understand what are the sort of design parameters that are going to be most important in designing that, you know? So, you know, how many queries per second do you need to handle? How big is the internet? How big is the index you need to handle? How much data do you need to keep for every document in the index? How are you going to look at it when you retrieve things? Um, what happens if traffic were to double or triple, you know, will that system work well? And I think a good design principle is you're going to want to design a system so that the most important characteristics could scale by like factors of five or 10, but probably not beyond that because often what happens is if you design a system for X. And something suddenly becomes a hundred X, that would enable a very different point in the design space that would not make sense at X. But all of a sudden at a hundred X makes total sense. So like going from a disk space index to a in memory index makes a lot of sense once you have enough traffic, because now you have enough replicas of the sort of state on disk that those machines now actually can hold, uh, you know, a full copy of the, uh, index and memory. Yeah. And that all of a sudden enabled. A completely different design that wouldn't have been practical before. Yeah. Um, so I'm, I'm a big fan of thinking through designs in your head, just kind of playing with the design space a little before you actually do a lot of writing of code. But, you know, as you said, in the early days of Google, we were growing the index, uh, quite extensively. We were growing the update rate of the index. So the update rate actually is the parameter that changed the most. Surprising. So it used to be once a month.Shawn Wang [00:28:55]: Yeah.Jeff Dean [00:28:56]: And then we went to a system that could update any particular page in like sub one minute. Okay.Shawn Wang [00:29:02]: Yeah. Because this is a competitive advantage, right?Jeff Dean [00:29:04]: Because all of a sudden news related queries, you know, if you're, if you've got last month's news index, it's not actually that useful for.Shawn Wang [00:29:11]: News is a special beast. Was there any, like you could have split it onto a separate system.Jeff Dean [00:29:15]: Well, we did. We launched a Google news product, but you also want news related queries that people type into the main index to also be sort of updated.Shawn Wang [00:29:23]: So, yeah, it's interesting. And then you have to like classify whether the page is, you have to decide which pages should be updated and what frequency. Oh yeah.Jeff Dean [00:29:30]: There's a whole like, uh, system behind the scenes that's trying to decide update rates and importance of the pages. So even if the update rate seems low, you might still want to recrawl important pages quite often because, uh, the likelihood they change might be low, but the value of having updated is high.Shawn Wang [00:29:50]: Yeah, yeah, yeah, yeah. Uh, well, you know, yeah. This, uh, you know, mention of latency and, and saving things to this reminds me of one of your classics, which I have to bring up, which is latency numbers. Every programmer should know, uh, was there a, was it just a, just a general story behind that? Did you like just write it down?Jeff Dean [00:30:06]: I mean, this has like sort of eight or 10 different kinds of metrics that are like, how long does a cache mistake? How long does branch mispredict take? How long does a reference domain memory take? How long does it take to send, you know, a packet from the U S to the Netherlands or something? Um,Shawn Wang [00:30:21]: why Netherlands, by the way, or is it, is that because of Chrome?Jeff Dean [00:30:25]: Uh, we had a data center in the Netherlands, um, so, I mean, I think this gets to the point of being able to do the back of the envelope calculations. So these are sort of the raw ingredients of those, and you can use them to say, okay, well, if I need to design a system to do image search and thumb nailing or something of the result page, you know, how, what I do that I could pre-compute the image thumbnails. I could like. Try to thumbnail them on the fly from the larger images. What would that do? How much dis bandwidth than I need? How many des seeks would I do? Um, and you can sort of actually do thought experiments in, you know, 30 seconds or a minute with the sort of, uh, basic, uh, basic numbers at your fingertips. Uh, and then as you sort of build software using higher level libraries, you kind of want to develop the same intuitions for how long does it take to, you know, look up something in this particular kind of.Shawn Wang [00:31:21]: I'll see you next time.Shawn Wang [00:31:51]: Which is a simple byte conversion. That's nothing interesting. I wonder if you have any, if you were to update your...Jeff Dean [00:31:58]: I mean, I think it's really good to think about calculations you're doing in a model, either for training or inference.Jeff Dean [00:32:09]: Often a good way to view that is how much state will you need to bring in from memory, either like on-chip SRAM or HBM from the accelerator. Attached memory or DRAM or over the network. And then how expensive is that data motion relative to the cost of, say, an actual multiply in the matrix multiply unit? And that cost is actually really, really low, right? Because it's order, depending on your precision, I think it's like sub one picodule.Shawn Wang [00:32:50]: Oh, okay. You measure it by energy. Yeah. Yeah.Jeff Dean [00:32:52]: Yeah. I mean, it's all going to be about energy and how do you make the most energy efficient system. And then moving data from the SRAM on the other side of the chip, not even off the off chip, but on the other side of the same chip can be, you know, a thousand picodules. Oh, yeah. And so all of a sudden, this is why your accelerators require batching. Because if you move, like, say, the parameter of a model from SRAM on the, on the chip into the multiplier unit, that's going to cost you a thousand picodules. So you better make use of that, that thing that you moved many, many times with. So that's where the batch dimension comes in. Because all of a sudden, you know, if you have a batch of 256 or something, that's not so bad. But if you have a batch of one, that's really not good.Shawn Wang [00:33:40]: Yeah. Yeah. Right.Jeff Dean [00:33:41]: Because then you paid a thousand picodules in order to do your one picodule multiply.Shawn Wang [00:33:46]: I have never heard an energy-based analysis of batching.Jeff Dean [00:33:50]: Yeah. I mean, that's why people batch. Yeah. Ideally, you'd like to use batch size one because the latency would be great.Shawn Wang [00:33:56]: The best latency.Jeff Dean [00:33:56]: But the energy cost and the compute cost inefficiency that you get is quite large. So, yeah.Shawn Wang [00:34:04]: Is there a similar trick like, like, like you did with, you know, putting everything in memory? Like, you know, I think obviously NVIDIA has caused a lot of waves with betting very hard on SRAM with Grok. I wonder if, like, that's something that you already saw with, with the TPUs, right? Like that, that you had to. Uh, to serve at your scale, uh, you probably sort of saw that coming. Like what, what, what hardware, uh, innovations or insights were formed because of what you're seeing there?Jeff Dean [00:34:33]: Yeah. I mean, I think, you know, TPUs have this nice, uh, sort of regular structure of 2D or 3D meshes with a bunch of chips connected. Yeah. And each one of those has HBM attached. Um, I think for serving some kinds of models, uh, you know, you, you pay a lot higher cost. Uh, and time latency, um, bringing things in from HBM than you do bringing them in from, uh, SRAM on the chip. So if you have a small enough model, you can actually do model parallelism, spread it out over lots of chips and you actually get quite good throughput improvements and latency improvements from doing that. And so you're now sort of striping your smallish scale model over say 16 or 64 chips. Uh, but as if you do that and it all fits in. In SRAM, uh, that can be a big win. So yeah, that's not a surprise, but it is a good technique.Alessio Fanelli [00:35:27]: Yeah. What about the TPU design? Like how much do you decide where the improvements have to go? So like, this is like a good example of like, is there a way to bring the thousand picojoules down to 50? Like, is it worth designing a new chip to do that? The extreme is like when people say, oh, you should burn the model on the ASIC and that's kind of like the most extreme thing. How much of it? Is it worth doing an hardware when things change so quickly? Like what was the internal discussion? Yeah.Jeff Dean [00:35:57]: I mean, we, we have a lot of interaction between say the TPU chip design architecture team and the sort of higher level modeling, uh, experts, because you really want to take advantage of being able to co-design what should future TPUs look like based on where we think the sort of ML research puck is going, uh, in some sense, because, uh, you know, as a hardware designer for ML and in particular, you're trying to design a chip starting today and that design might take two years before it even lands in a data center. And then it has to sort of be a reasonable lifetime of the chip to take you three, four or five years. So you're trying to predict two to six years out where, what ML computations will people want to run two to six years out in a very fast changing field. And so having people with interest. Interesting ML research ideas of things we think will start to work in that timeframe or will be more important in that timeframe, uh, really enables us to then get, you know, interesting hardware features put into, you know, TPU N plus two, where TPU N is what we have today.Shawn Wang [00:37:10]: Oh, the cycle time is plus two.Jeff Dean [00:37:12]: Roughly. Wow. Because, uh, I mean, sometimes you can squeeze some changes into N plus one, but, you know, bigger changes are going to require the chip. Yeah. Design be earlier in its lifetime design process. Um, so whenever we can do that, it's generally good. And sometimes you can put in speculative features that maybe won't cost you much chip area, but if it works out, it would make something, you know, 10 times as fast. And if it doesn't work out, well, you burned a little bit of tiny amount of your chip area on that thing, but it's not that big a deal. Uh, sometimes it's a very big change and we want to be pretty sure this is going to work out. So we'll do like lots of carefulness. Uh, ML experimentation to show us, uh, this is actually the, the way we want to go. Yeah.Alessio Fanelli [00:37:58]: Is there a reverse of like, we already committed to this chip design so we can not take the model architecture that way because it doesn't quite fit?Jeff Dean [00:38:06]: Yeah. I mean, you, you definitely have things where you're going to adapt what the model architecture looks like so that they're efficient on the chips that you're going to have for both training and inference of that, of that, uh, generation of model. So I think it kind of goes both ways. Um, you know, sometimes you can take advantage of, you know, lower precision things that are coming in a future generation. So you can, might train it at that lower precision, even if the current generation doesn't quite do that. Mm.Shawn Wang [00:38:40]: Yeah. How low can we go in precision?Jeff Dean [00:38:43]: Because people are saying like ternary is like, uh, yeah, I mean, I'm a big fan of very low precision because I think that gets, that saves you a tremendous amount of time. Right. Because it's picojoules per bit that you're transferring and reducing the number of bits is a really good way to, to reduce that. Um, you know, I think people have gotten a lot of luck, uh, mileage out of having very low bit precision things, but then having scaling factors that apply to a whole bunch of, uh, those, those weights. Scaling. How does it, how does it, okay.Shawn Wang [00:39:15]: Interesting. You, so low, low precision, but scaled up weights. Yeah. Huh. Yeah. Never considered that. Yeah. Interesting. Uh, w w while we're on this topic, you know, I think there's a lot of, um, uh, this, the concept of precision at all is weird when we're sampling, you know, uh, we just, at the end of this, we're going to have all these like chips that I'll do like very good math. And then we're just going to throw a random number generator at the start. So, I mean, there's a movement towards, uh, energy based, uh, models and processors. I'm just curious if you've, obviously you've thought about it, but like, what's your commentary?Jeff Dean [00:39:50]: Yeah. I mean, I think. There's a bunch of interesting trends though. Energy based models is one, you know, diffusion based models, which don't sort of sequentially decode tokens is another, um, you know, speculative decoding is a way that you can get sort of an equivalent, very small.Shawn Wang [00:40:06]: Draft.Jeff Dean [00:40:07]: Batch factor, uh, for like you predict eight tokens out and that enables you to sort of increase the effective batch size of what you're doing by a factor of eight, even, and then you maybe accept five or six of those tokens. So you get. A five, a five X improvement in the amortization of moving weights, uh, into the multipliers to do the prediction for the, the tokens. So these are all really good techniques and I think it's really good to look at them from the lens of, uh, energy, real energy, not energy based models, um, and, and also latency and throughput, right? If you look at things from that lens, that sort of guides you to. Two solutions that are gonna be, uh, you know, better from, uh, you know, being able to serve larger models or, you know, equivalent size models more cheaply and with lower latency.Shawn Wang [00:41:03]: Yeah. Well, I think, I think I, um, it's appealing intellectually, uh, haven't seen it like really hit the mainstream, but, um, I do think that, uh, there's some poetry in the sense that, uh, you know, we don't have to do, uh, a lot of shenanigans if like we fundamentally. Design it into the hardware. Yeah, yeah.Jeff Dean [00:41:23]: I mean, I think there's still a, there's also sort of the more exotic things like analog based, uh, uh, computing substrates as opposed to digital ones. Uh, I'm, you know, I think those are super interesting cause they can be potentially low power. Uh, but I think you often end up wanting to interface that with digital systems and you end up losing a lot of the power advantages in the digital to analog and analog to digital conversions. You end up doing, uh, at the sort of boundaries. And periphery of that system. Um, I still think there's a tremendous distance we can go from where we are today in terms of energy efficiency with sort of, uh, much better and specialized hardware for the models we care about.Shawn Wang [00:42:05]: Yeah.Alessio Fanelli [00:42:06]: Um, any other interesting research ideas that you've seen, or like maybe things that you cannot pursue a Google that you would be interested in seeing researchers take a step at, I guess you have a lot of researchers. Yeah, I guess you have enough, but our, our research.Jeff Dean [00:42:21]: Our research portfolio is pretty broad. I would say, um, I mean, I think, uh, in terms of research directions, there's a whole bunch of, uh, you know, open problems and how do you make these models reliable and able to do much longer, kind of, uh, more complex tasks that have lots of subtasks. How do you orchestrate, you know, maybe one model that's using other models as tools in order to sort of build, uh, things that can accomplish, uh, you know, much more. Yeah. Significant pieces of work, uh, collectively, then you would ask a single model to do. Um, so that's super interesting. How do you get more verifiable, uh, you know, how do you get RL to work for non-verifiable domains? I think it's a pretty interesting open problem because I think that would broaden out the capabilities of the models, the improvements that you're seeing in both math and coding. Uh, if we could apply those to other less verifiable domains, because we've come up with RL techniques that actually enable us to do that. Uh, effectively, that would, that would really make the models improve quite a lot. I think.Alessio Fanelli [00:43:26]: I'm curious, like when we had Noam Brown on the podcast, he said, um, they already proved you can do it with deep research. Um, you kind of have it with AI mode in a way it's not verifiable. I'm curious if there's any thread that you think is interesting there. Like what is it? Both are like information retrieval of JSON. So I wonder if it's like the retrieval is like the verifiable part. That you can score or what are like, yeah, yeah. How, how would you model that, that problem?Jeff Dean [00:43:55]: Yeah. I mean, I think there are ways of having other models that can evaluate the results of what a first model did, maybe even retrieving. Can you have another model that says, is this things, are these things you retrieved relevant? Or can you rate these 2000 things you retrieved to assess which ones are the 50 most relevant or something? Um, I think those kinds of techniques are actually quite effective. Sometimes I can even be the same model, just prompted differently to be a, you know, a critic as opposed to a, uh, actual retrieval system. Yeah.Shawn Wang [00:44:28]: Um, I do think like there, there is that, that weird cliff where like, it feels like we've done the easy stuff and then now it's, but it always feels like that every year. It's like, oh, like we know, we know, and the next part is super hard and nobody's figured it out. And, uh, exactly with this RLVR thing where like everyone's talking about, well, okay, how do we. the next stage of the non-verifiable stuff. And everyone's like, I don't know, you know, Ellen judge.Jeff Dean [00:44:56]: I mean, I feel like the nice thing about this field is there's lots and lots of smart people thinking about creative solutions to some of the problems that we all see. Uh, because I think everyone sort of sees that the models, you know, are great at some things and they fall down around the edges of those things and, and are not as capable as we'd like in those areas. And then coming up with good techniques and trying those. And seeing which ones actually make a difference is sort of what the whole research aspect of this field is, is pushing forward. And I think that's why it's super interesting. You know, if you think about two years ago, we were struggling with GSM, eight K problems, right? Like, you know, Fred has two rabbits. He gets three more rabbits. How many rabbits does he have? That's a pretty far cry from the kinds of mathematics that the models can, and now you're doing IMO and Erdos problems in pure language. Yeah. Yeah. Pure language. So that is a really, really amazing jump in capabilities in, you know, in a year and a half or something. And I think, um, for other areas, it'd be great if we could make that kind of leap. Uh, and you know, we don't exactly see how to do it for some, some areas, but we do see it for some other areas and we're going to work hard on making that better. Yeah.Shawn Wang [00:46:13]: Yeah.Alessio Fanelli [00:46:14]: Like YouTube thumbnail generation. That would be very helpful. We need that. That would be AGI. We need that.Shawn Wang [00:46:20]: That would be. As far as content creators go.Jeff Dean [00:46:22]: I guess I'm not a YouTube creator, so I don't care that much about that problem, but I guess, uh, many people do.Shawn Wang [00:46:27]: It does. Yeah. It doesn't, it doesn't matter. People do judge books by their covers as it turns out. Um, uh, just to draw a bit on the IMO goal. Um, I'm still not over the fact that a year ago we had alpha proof and alpha geometry and all those things. And then this year we were like, screw that we'll just chuck it into Gemini. Yeah. What's your reflection? Like, I think this, this question about. Like the merger of like symbolic systems and like, and, and LMS, uh, was a very much core belief. And then somewhere along the line, people would just said, Nope, we'll just all do it in the LLM.Jeff Dean [00:47:02]: Yeah. I mean, I think it makes a lot of sense to me because, you know, humans manipulate symbols, but we probably don't have like a symbolic representation in our heads. Right. We have some distributed representation that is neural net, like in some way of lots of different neurons. And activation patterns firing when we see certain things and that enables us to reason and plan and, you know, do chains of thought and, you know, roll them back now that, that approach for solving the problem doesn't seem like it's going to work. I'm going to try this one. And, you know, in a lot of ways we're emulating what we intuitively think, uh, is happening inside real brains in neural net based models. So it never made sense to me to have like completely separate. Uh, discrete, uh, symbolic things, and then a completely different way of, of, uh, you know, thinking about those things.Shawn Wang [00:47:59]: Interesting. Yeah. Uh, I mean, it's maybe seems obvious to you, but it wasn't obvious to me a year ago. Yeah.Jeff Dean [00:48:06]: I mean, I do think like that IMO with, you know, translating to lean and using lean and then the next year and also a specialized geometry model. And then this year switching to a single unified model. That is roughly the production model with a little bit more inference budget, uh, is actually, you know, quite good because it shows you that the capabilities of that general model have improved dramatically and, and now you don't need the specialized model. This is actually sort of very similar to the 2013 to 16 era of machine learning, right? Like it used to be, people would train separate models for lots of different, each different problem, right? I have, I want to recognize street signs and something. So I train a street sign. Recognition recognition model, or I want to, you know, decode speech recognition. I have a speech model, right? I think now the era of unified models that do everything is really upon us. And the question is how well do those models generalize to new things they've never been asked to do and they're getting better and better.Shawn Wang [00:49:10]: And you don't need domain experts. Like one of my, uh, so I interviewed ETA who was on, who was on that team. Uh, and he was like, yeah, I, I don't know how they work. I don't know where the IMO competition was held. I don't know the rules of it. I just trained the models, the training models. Yeah. Yeah. And it's kind of interesting that like people with these, this like universal skill set of just like machine learning, you just give them data and give them enough compute and they can kind of tackle any task, which is the bitter lesson, I guess. I don't know. Yeah.Jeff Dean [00:49:39]: I mean, I think, uh, general models, uh, will win out over specialized ones in most cases.Shawn Wang [00:49:45]: Uh, so I want to push there a bit. I think there's one hole here, which is like, uh. There's this concept of like, uh, maybe capacity of a model, like abstractly a model can only contain the number of bits that it has. And, uh, and so it, you know, God knows like Gemini pro is like one to 10 trillion parameters. We don't know, but, uh, the Gemma models, for example, right? Like a lot of people want like the open source local models that are like that, that, that, and, and, uh, they have some knowledge, which is not necessary, right? Like they can't know everything like, like you have the. The luxury of you have the big model and big model should be able to capable of everything. But like when, when you're distilling and you're going down to the small models, you know, you're actually memorizing things that are not useful. Yeah. And so like, how do we, I guess, do we want to extract that? Can we, can we divorce knowledge from reasoning, you know?Jeff Dean [00:50:38]: Yeah. I mean, I think you do want the model to be most effective at reasoning if it can retrieve things, right? Because having the model devote precious parameter space. To remembering obscure facts that could be looked up is actually not the best use of that parameter space, right? Like you might prefer something that is more generally useful in more settings than this obscure fact that it has. Um, so I think that's always attention at the same time. You also don't want your model to be kind of completely detached from, you know, knowing stuff about the world, right? Like it's probably useful to know how long the golden gate be. Bridges just as a general sense of like how long are bridges, right? And, uh, it should have that kind of knowledge. It maybe doesn't need to know how long some teeny little bridge in some other more obscure part of the world is, but, uh, it does help it to have a fair bit of world knowledge and the bigger your model is, the more you can have. Uh, but I do think combining retrieval with sort of reasoning and making the model really good at doing multiple stages of retrieval. Yeah.Shawn Wang [00:51:49]: And reasoning through the intermediate retrieval results is going to be a, a pretty effective way of making the model seem much more capable, because if you think about, say, a personal Gemini, yeah, right?Jeff Dean [00:52:01]: Like we're not going to train Gemini on my email. Probably we'd rather have a single model that, uh, we can then use and use being able to retrieve from my email as a tool and have the model reason about it and retrieve from my photos or whatever, uh, and then make use of that and have multiple. Um, you know, uh, stages of interaction. that makes sense.Alessio Fanelli [00:52:24]: Do you think the vertical models are like, uh, interesting pursuit? Like when people are like, oh, we're building the best healthcare LLM, we're building the best law LLM, are those kind of like short-term stopgaps or?Jeff Dean [00:52:37]: No, I mean, I think, I think vertical models are interesting. Like you want them to start from a pretty good base model, but then you can sort of, uh, sort of viewing them, view them as enriching the data. Data distribution for that particular vertical domain for healthcare, say, um, we're probably not going to train or for say robotics. We're probably not going to train Gemini on all possible robotics data. We, you could train it on because we want it to have a balanced set of capabilities. Um, so we'll expose it to some robotics data, but if you're trying to build a really, really good robotics model, you're going to want to start with that and then train it on more robotics data. And then maybe that would. It's multilingual translation capability, but improve its robotics capabilities. And we're always making these kind of, uh, you know, trade-offs in the data mix that we train the base Gemini models on. You know, we'd love to include data from 200 more languages and as much data as we have for those languages, but that's going to displace some other capabilities of the model. It won't be as good at, um, you know, Pearl programming, you know, it'll still be good at Python programming. Cause we'll include it. Enough. Of that, but there's other long tail computer languages or coding capabilities that it may suffer on or multi, uh, multimodal reasoning capabilities may suffer. Cause we didn't get to expose it to as much data there, but it's really good at multilingual things. So I, I think some combination of specialized models, maybe more modular models. So it'd be nice to have the capability to have those 200 languages, plus this awesome robotics model, plus this awesome healthcare, uh, module that all can be knitted together to work in concert and called upon in different circumstances. Right? Like if I have a health related thing, then it should enable using this health module in conjunction with the main base model to be even better at those kinds of things. Yeah.Shawn Wang [00:54:36]: Installable knowledge. Yeah.Jeff Dean [00:54:37]: Right.Shawn Wang [00:54:38]: Just download as a, as a package.Jeff Dean [00:54:39]: And some of that installable stuff can come from retrieval, but some of it probably should come from preloaded training on, you know, uh, a hundred billion tokens or a trillion tokens of health data. Yeah.Shawn Wang [00:54:51]: And for listeners, I think, uh, I will highlight the Gemma three end paper where they, there was a little bit of that, I think. Yeah.Alessio Fanelli [00:54:56]: Yeah. I guess the question is like, how many billions of tokens do you need to outpace the frontier model improvements? You know, it's like, if I have to make this model better healthcare and the main. Gemini model is still improving. Do I need 50 billion tokens? Can I do it with a hundred, if I need a trillion healthcare tokens, it's like, they're probably not out there that you don't have, you know, I think that's really like the.Jeff Dean [00:55:21]: Well, I mean, I think healthcare is a particularly challenging domain, so there's a lot of healthcare data that, you know, we don't have access to appropriately, but there's a lot of, you know, uh, healthcare organizations that want to train models on their own data. That is not public healthcare data, uh, not public health. But public healthcare data. Um, so I think there are opportunities there to say, partner with a large healthcare organization and train models for their use that are going to be, you know, more bespoke, but probably, uh, might be better than a general model trained on say, public data. Yeah.Shawn Wang [00:55:58]: Yeah. I, I believe, uh, by the way, also this is like somewhat related to the language conversation. Uh, I think one of your, your favorite examples was you can put a low resource language in the context and it just learns. Yeah.Jeff Dean [00:56:09]: Oh, yeah, I think the example we used was Calamon, which is truly low resource because it's only spoken by, I think 120 people in the world and there's no written text.Shawn Wang [00:56:20]: So, yeah. So you can just do it that way. Just put it in the context. Yeah. Yeah. But I think your whole data set in the context, right.Jeff Dean [00:56:27]: If you, if you take a language like, uh, you know, Somali or something, there is a fair bit of Somali text in the world that, uh, or Ethiopian Amharic or something, um, you know, we probably. Yeah. Are not putting all the data from those languages into the Gemini based training. We put some of it, but if you put more of it, you'll improve the capabilities of those models.Shawn Wang [00:56:49]: Yeah.Jeff Dean [00:56:49]:

Let's Talk AI
#233 - Moltbot, Genie 3, Qwen3-Max-Thinking

Let's Talk AI

Play Episode Listen Later Feb 6, 2026 80:33


Our 233rd episode with a summary and discussion of last week's big AI news!Recorded on 01/30/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Google introduces Gemini AI agent in Chrome for advanced browser functionality, including auto-browsing for pro and ultra subscribers.OpenAI releases ChatGPT Translator and Prism, expanding its applications beyond core business to language translation and scientific research assistance.Significant funding rounds and valuations achieved by startups Recursive and New Rofo, focusing on specialized AI chips and optical processors respectively.Political and social issues, including violence in Minnesota, prompt tech leaders in AI like Ade from Anthropic and Jeff Dean from Google to express concerns about the current administration's actions.Timestamps:(00:00:10) Intro / BanterTools & Apps(00:04:09) Google adds Gemini AI-powered ‘auto browse' to Chrome | The Verge(00:07:11) Users flock to open source Moltbot for always-on AI, despite major risks - Ars Technica(00:13:25) Google Brings Genie 3 'World Building' Experiment to AI Ultra Subscribers - CNET(00:16:17) OpenAI's ChatGPT translator challenges Google Translate | The Verge(00:18:27) OpenAI launches Prism, a new AI workspace for scientists | TechCrunchApplications & Business(00:19:49) Exclusive: China gives nod to ByteDance, Alibaba and Tencent to buy Nvidia's H200 chips - sources | Reuters(00:22:55) AI chip startup Ricursive hits $4B valuation 2 months after launch(00:24:38) AI Startup Recursive in Funding Talks at $4 Billion Valuation - Bloomberg(00:27:30) Flapping Airplanes and the promise of research-driven AI | TechCrunch(00:31:54) From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunchProjects & Open Source(00:35:34) Qwen3-Max-Thinking debuts with focus on hard math, code(00:38:26) China's Moonshot releases a new open-source model Kimi K2.5 and a coding agent | TechCrunch(00:46:00) Ai2 launches family of open-source AI developer agents that adapt to any codebase - SiliconANGLE(00:47:46) Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Meta's LlamaResearch & Advancements(00:52:53) Post-LayerNorm Is Back: Stable, ExpressivE, and Deep(00:58:00) [2601.19897] Self-Distillation Enables Continual Learning(01:03:04) [2601.20802] Reinforcement Learning via Self-Distillation(01:05:58) Teaching Models to Teach Themselves: Reasoning at the Edge of LearnabilityPolicy & Safety(01:09:13) Amodei, Hoffman Join Tech Workers Decrying Minnesota Violence - BloombergSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Business Pants
The feckless Minnesota CEO response: George Floyd vs. Alex Pretti

Business Pants

Play Episode Listen Later Jan 28, 2026 60:19


At the beginning of December 2026: ICE announced an enforcement surge in the Twin Cities.January 6, 2026: DHS announced what it called the largest immigration enforcement operation ever carried out, sending 2,000 agents to the Minneapolis–Saint Paul metropolitan area. January 7, 2026: ICE agent Jonathan Ross fatally shoots Renée Nicole GoodJanuary 8–14, 2026: Protests, vigils, and marches continue in Minneapolis against ICE and Operation Metro SurgeJanuary 13, 2026: ‘Madness': two US citizens violently detained by ICE in Minnesota, officials say. Two Target employees forced to the ground, then into SUV, then dumped in different parking lotJanuary 14, 2026: A different ICE agent shoots and injures a man in north Minneapolis; the man survives after being shot in the leg. This second shooting further intensifies public anger and calls for an end to the federal surgeJanuary 17, 2026: National Anger Spills Into Target Stores, AgainJanuary 22, 2026: Target Store Staff Are Skipping Work Over ICE's Crackdown in MinnesotaJanuary 23, 2026: A statewide Day of Truth & Freedom / Minnesota general strike is held, described as the first U.S. general strike in about 80 years, explicitly targeting ICE operations and Operation Metro Surge. On that day, many workers, businesses, schools, and institutions in Minneapolis and across Minnesota participate in work stoppages, marches, and large rallies against federal immigration enforcement.January 24, 2026: Federal Border Patrol agents assigned to the metro surge shoot and kill Alex Jeffrey PrettiJanuary 25, 2026: The Minnesota Chamber of Commerce released this letter on behalf of more than 60 CEOs of Minnesota-based companies today.Eight people have died in dealings with ICE so far in 2026. Keith Porter, Parady La, Heber Sanchaz Domínguez, Victor Manuel Diaz, Luis Beltran Yanez-Cruz, Luis Gustavo Nunez Caceres, and Geraldo Lunas Campos. The high-profile fatal shootings follow the deaths of at least 32 people in ICE custody in 2025 – the highest number since 2004.Minnesota CEOs Seek De-Escalation After Border Police Shooting“The business community in Minnesota prides itself in providing leadership and solving problems to ensure a strong and vibrant state. The recent challenges facing our state have created widespread disruption and tragic loss of life. For the past several weeks, representatives of Minnesota's business community have been working every day behind the scenes with federal, state and local officials to advance real solutions. These efforts have included close communication with the Governor, the White House, the Vice President and local mayors. There are ways for us to come together to foster progress. With yesterday's tragic news, we are calling for an immediate deescalation of tensions and for state, local and federal officials to work together to find real solutions. We have been working for generations to build a strong and vibrant state here in Minnesota and will do so in the months and years ahead with equal and even greater commitment. In this difficult moment for our community, we call for peace and focused cooperation among local, state and federal leaders to achieve a swift and durable solution that enables families, businesses, our employees, and communities across Minnesota to resume our work to build a bright and prosperous future. “3M – William Brown, Chairman and CEOAmeriprise Financial – James Cracchiolo, Chairman and CEOAPi Group – Russell Becker, CEOBest Buy – Corie Barry, CEO C.H. Robinson – Dave Bozeman, President and CEODeluxe Corporation – Barry McCarthy, President and CEODonaldson Company, Inc. – Tod Carpenter, Chairman and CEOEcolab – Christophe Beck, Chairman and CEOGeneral Mills – Jeff Harmening, Chairman and CEOH.B. Fuller – On behalf of our entire organization [CEO Celeste Mastin]Hormel – Jeff Ettinger, Interim CEOMedtronic – Geoff Martha, CEO and ChairmannVent – Beth Wozniak, Chair and CEO Patterson Companies – Robert Rajalingam, CEOPentair – John L. Stauch, President and CEOPiper Sandler – Chad Abraham, Chairman and CEOSleep Number – Linda Findley, CEO (4/2025)Solventum – Bryan Hanson, CEOSPS Commerce – Chad Collins, CEO SunOpta – Brian Kocher, CEOTarget – Michael Fiddelke, Incoming CEO Tennant Company – Dave Huml, CEOThe Toro Company – Rick Olson, Chairman and CEOU.S. Bancorp – Gunjan Kedia, CEOWinnebago Industries – Michael Happe, CEOXcel Energy – Bob Frenzel, Chairman and CEO Keith Rabois, Managing director of Khosla Ventures: “no law enforcement has shot an innocent person. illegals are committing violent crimes everyday.”Khosla Ventures: “We prefer brutal honesty to hypocritical politeness.”“Technology and innovation have reshaped our world and disrupted the way we all live and work. The future may not be knowable, but it is inventable—and it belongs to those who dare to imagine what's possible.”Managing Directors: 5 dudes (3 stanford; 3 harvard)Founder Vinod Khosla: “I agree with @EthanChoi7. Macho ICE vigilantes running amuck empowered by a conscious-less administration. The video was sickening to watch and the storytelling without facts or with invented fictitious facts by authorities almost unimaginable in a civilized society. ICE personnel must have ice water running thru their veins to treat other human beings this way. There is politics but humanity should transcend that”Target's incoming CEO Michael Fiddelke in a video message sent to employees (January 26, 2026): “Right now, as someone who is raising a family here in the Twin Cities and as a leader of this hometown company I want to acknowledge where we are. The violence and loss of life in our community is incredibly painful. I know it's weighing heavily on many of you across the country, as it is with me. What's happening affects us not just as a company but as people, as neighbors, friends and family members.”A company spokesman declined to comment. Still nothing official on website.Lloyd Vogel, CEO Garage Grown Gear: said he felt compelled to condemn the shootings in a LinkedIn post because he lives and works in the Twin Cities. "My primary rationale was to show solidarity with my community," he told Business Insider. "It's also just bad for business when people are afraid to leave their homes.""There's so much fear in Minnesota right now," he said. "It would just be cowardice to not have a perspective on this."JPMorgan Chase CEO and Chair Jamie Dimon 1/22/26 Davos): ″I don't like what I'm seeing, five grown men beating up a little old lady. So I think we should calm down a little bit on the internal anger about immigration… We need these people. They work in our hospitals and hotels and restaurants and agriculture, and they're good people.… They should be treated that way.”On Saturday evening (1/24/2026), top technology executives gathered in Washington to attend a screening of “Melania,” a documentary produced by Amazon about the first lady, Melania Trump. Black-tie event: guests were handed monogrammed buckets of popcorn, framed screening tickets for their trophy shelves, and a limited-edition copy of Trump's 2024 book of the same title as her documentary, “Melania.“Among them was Andy Jassy, the chief executive of Amazon; Tim Cook, the chief executive of Apple; and Lisa Su, the chief executive of chip maker AMD.Also: Eric Yuan – CEO, Zoom; Lynn Martin – President, New York Stock Exchange; General Electric CEO Larry CulpApple CEO Tim Cook says it's 'time for de-escalation' in MinneapolisCook came under fire for appearing at The White House just hours after federal immigration authorities killed Alex Pretti, a veterans' nurse, in Minnesota“This is a time for de-escalation,” Cook wrote to Apple staff. “I believe America is strongest when we live up to our highest ideals, when we treat everyone with dignity and respect no matter who they are or where they're from, and when we embrace our shared humanity.”Cook said he “had a good conversation with the president this week where I shared my views, and I appreciate his openness to engaging on issues that matter to us all." Apple's Cook says he's ‘heartbroken' by Minneapolis events and has spoken with TrumpOpen AI CEO Sam Altman (1/27/26): I love the US and its values of democracy and freedom and will be supportive of the country however I can; OpenAI will too. But part of loving the country is the American duty to push back against overreach. What's happening with ICE is going too far. There is a big difference between deporting violent criminals and what's happening now, and we need to get the distinction right. President Trump is a very strong leader, and I hope he will rise to this moment and unite the country. I am encouraged by the last few hours of response and hope to see trust rebuilt with transparent investigations. As a company, we aim to stick to our convictions and not get blown around by changing fashions too much. We didn't become super woke when that was popular, we didn't start talking about masculine corporate energy when that was popular, and we are not going to make a lot of performative statements now about safety or politics or anything else. But we are going to continue to try to figure out how to actually do the right thing as best as we can, engage with leaders and push for our values, and speak up clearly about it as needed.James Dyett, Global Business at OpenAI: “There is far more outrage from tech leaders over a wealth tax than masked ICE agents terrorizing communities and executing civilians in the streets. Tells you what you need to know about the values of our industry.”Angel Investor Jason Calacanis: Once again, I will remind everyone that our leaders are failing us. True leadership would be to calm this situation down by telling these non-peaceful protestors to stay home while recalling these inadequately-trained agents.”Jeff Dean, Chief Scientist, Google DeepMind & Google Research. Gemini Lead: “This is absolutely shameful. Agents of a federal agency unnecessarily escalating, and then executing a defenseless citizen whose offense appears to be using his cell phone camera. Every person regardless of political affiliation should be denouncing this.”Jeffrey Sonnenfeld, senior associate dean for leadership studies at the Yale School of Management: "CEOs are feeling the community pressure." He said that reactions that convey sorrow and don't mention Trump or ICE are likely to be perceived as an unwelcome challenge to the White House's immigration agenda. "That is not what the Trump administration wanted," he said.Business Roundtable CEO Joshua Bolten asked to comment on the chaos in Minneapolis: replied with a statement endorsing the Minnesota Chamber's call for "cooperation between state, local, and federal authorities to immediately de-escalate the situation in Minneapolis."Robert Pasin, CEO of toy company Radio Flyer: recently shared an email on LinkedIn that he sent to his employees that was critical of the shootings in Minneapolis: "I am deeply concerned about the current state of our democracy, and the continued actions we are seeing from President Trump and his administration that are intended to undermine democratic institutions, the rule of law, and the norms that hold our country together."Dario Amodei, CEO Anthropic: called the events in Minnesota a “horror” on Monday. An Anthropic spokeswoman said the company did not have contracts with ICE.ICEout.tech statement from January 24, 2026: "We condemn the Border Patrol's killing of Alex Pretti and the violent surge of federal agents across our cities. The wanton brutality we've seen from ICE and CBP has removed any credibility that these actions are about immigration enforcement. Their goal is terror, cruelty, and suppression of dissent. This must end. Tech professionals are speaking up against this brutality, and we call on all our colleagues who share our values to use their voice. We know our industry leaders have leverage: in October, they persuaded Trump to call off a planned ICE surge in San Francisco, and big tech CEOs are in the White House tonight. Now they need to go further, and join us in demanding ICE out of all of our cities." 811: 508 names; 19 one name with title, 284 role onlyReid Hoffman says business leaders are wrong to stay silent about the Trump administrationThe LinkedIn cofounder and tech investor said in an episode of the "Rapid Response" podcast published Tuesday that he rejects the idea that executives can simply wait out political turbulence: "The theory that if you just keep your mouth shut, the storm will blow over and it won't be a problem — you should be disabused of that theory now," Hoffman said.Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti: Leadership defended its work as in part improving “ICE's operational effectiveness.”

Off The Charts Football Podcast
Conference Championship Preview

Off The Charts Football Podcast

Play Episode Listen Later Jan 22, 2026 30:06


This week host James Weaver is joined by Bryce Rossler and Jeff Dean of the Operations Department, as the crew breaks down the upcoming Conference Championship games. What exactly can we expect from Jarrett Stidham, and how can the Broncos pull out the unexpected victory? Are we in for another classic NFC West matchup between the Rams and Seahawks? The trio dives into how each team matches up with each other on both sides of the ball.Off The Charts features a blend of statistical insights, tactical analysis, and personal opinions, aimed at providing listeners with a comprehensive understanding of the week's key matchups and the intricacies of the sport. You can follow our content on Twitter at @Football_SIS, on Bluesky at @sportsinfosis.bsky.social and at sportsinfosolutions.com.

Engineering Kiosk
#223 Throw redundancy at the tail: Request Hedging bei Google & Co.

Engineering Kiosk

Play Episode Listen Later Nov 25, 2025 65:33


Kennst du das? Neun Klicks sind blitzschnell, der zehnte hängt gefühlt ewig. Genau da frisst die Tail Latency deine User Experience und der Durchschnittswert hilft dir kein bisschen. In dieser Episode tauchen wir in Request Hedging ein, also das bewusste Duplizieren von Requests, um P99 zu drücken und Ausreißer zu entschärfen.Wir starten mit einem kurzen Recap zu Resilience Engineering: Timeouts, Retries, Exponential Backoff, Jitter, Circuit Breaker. Danach gehen wir tief rein ins Hedging: Was ist der Hedge Threshold, warum optimieren wir auf Tail statt Head Latency und wie Perzentile wie P50, P95 und P99 die Sicht auf Performance verändern. Wir zeigen, wie du Hedging sicher umsetzt, ohne dein Backend zu überlasten, wo Idempotenz Pflicht ist und warum Schreibzugriffe besonders heikel sind.In der Praxis klären wir, wie du Requests sauber cancelst: HTTP 1.1 via FIN und Reset, HTTP 2 mit RESET_STREAM, gRPC Support und wie Go mit Context Cancellation nativ hilft. Zum Tooling gibt es echte Beispiele: Envoy als Cloud-native Proxy mit Hedging, gRPC, Open Source Erfahrungen. In der Datenbankwelt sprechen wir über Read Hedging, Quorum Reads und Write-Constraints bei Cassandra und Kafka, über Vitess im MySQL-Universum und Grenzen von PG Bouncer. Auch Caches wie Redis und Memcached sowie DNS Patterns wie Happy Eyeballs sind am Start. Historisch ordnen wir das Ganze mit The Tail at Scale von Jeff Dean ein und schauen, wie Google, Netflix, Uber, LinkedIn oder Cloudflare Hedging verwenden.Am Ende nimmst du klare Best Practices mit: Hedging gezielt auf Tail Latency einsetzen, Requests wirklich canceln, Idempotenz sicherstellen, dynamische Thresholds mit Observability füttern und deine Guardrails definieren.Neugierig, ob Hedging dein P99 rettet, ohne dich selbst zu ddosen? Genau darum geht es.Bonus: Hedgehog hat damit nichts zu tun, auch wenn der Name dazu verführt.Keywords: Resilience Engineering, Request Hedging, Tail Latency, P99, Perzentile, Microservices, HTTP 2, gRPC, Go Context, Observability, Monitoring, Prometheus, Grafana, Envoy, Open Source, Cassandra, Kafka, Vitess, Redis, Memcached, Quorum Reads, Tech Community, Networking.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

HLTH Matters
Making Clinical AI Work: Nikhil Buduma on Workflow-Native Automation and the Future of Healthcare Efficiency

HLTH Matters

Play Episode Listen Later Nov 20, 2025 14:05


About Nikhil Buduma:Nikhil Buduma is a San Francisco–based entrepreneur, scientist, and engineer working at the cutting edge of AI and healthcare. He is the co-founder and CEO of Ambience Healthcare, an AI platform built to supercharge every healthcare worker with intelligent automation. Under his leadership, Ambience has grown into one of the most well-funded AI healthcare startups in the world, raising over $343 million from top investors, including a16z, OpenAI, Kleiner Perkins, Oak HC/FT, Optum Ventures, and industry pioneers such as Jeff Dean and Pieter Abbeel. Before becoming CEO, Nikhil served as Ambience's Chief Scientist, leading the development of its core AI systems that streamline documentation, coding, and clinical workflows for healthcare systems, including the Cleveland Clinic and St. Luke's.Prior to Ambience, Nikhil co-founded Remedy Health, where he applied machine learning to advance value-based care models, backed by Khosla Ventures and Greylock. He also co-founded Lean On Me, a nonprofit organization that supports mental health and wellness across U.S. college campuses through anonymous peer-to-peer text support networks at institutions such as MIT, Duke, and UC Berkeley.A graduate and valedictorian of Bellarmine College Preparatory, Nikhil earned both his bachelor's and master's degrees in computer science and engineering from MIT. His career reflects a rare blend of technical mastery, compassion, and vision—using AI not to replace clinicians, but to restore the human joy in the practice of medicine.Things You'll Learn:Health systems often see low real-world usage of ambient tools; when daily adoption crosses most clinicians and visits, the ROI conversation becomes meaningful. This requires solving fundamentals across specialties, not just shipping features.If AI generates notes that don't align with payer rules and codes, organizations incur rework and risk. Integrating HCC, ICD-10, and CPT selection, along with supporting language, at the point of care helps prevent denials.Revenue integrity upside: Bringing CDI intelligence forward can reclaim large sums from work already done but not credited. This strengthens both financial sustainability and compliance posture.Continuous third-party auditing and domain-specific modeling are essential because general reasoning models often struggle with the nuances of revenue cycles. Independent validation builds organizational trust.Patient Summary anticipates questions and data needs before the visit, while Chart Chat answers complex, EHR-aware queries in seconds, helping to democratize top-tier standards of care in rural settings.Resources:Connect with and follow Nikhil Buduma on LinkedIn.Follow Ambience Healthcare on LinkedIn and visit their website. 

The Power Chord Hour Podcast
Ep 172 - Jeff Dean (Heavy Seas) - Power Chord Hour Podcast

The Power Chord Hour Podcast

Play Episode Listen Later Sep 1, 2025 39:55


This week on the show Jeff Dean of Heavy Seas and Her Heads on Fire chat about the latest Heavy Seas release By Degrees and lots moreHEAVY SEAShttps://www.instagram.com/heavyseaschicagohttps://www.facebook.com/heavyseaschicagoPCHInstagram - www.instagram.com/powerchordhourTwitter - www.twitter.com/powerchordhourFacebook - www.facebook.com/powerchordhourYoutube - www.youtube.com/channel/UC6jTfzjB3-mzmWM-51c8LggSpotify Episode Playlists - https://open.spotify.com/user/kzavhk5ghelpnthfby9o41gnr?si=4WvOdgAmSsKoswf_HTh_MgDonate to help show costs -https://www.paypal.com/paypalme/pchanthonyhttps://cash.app/$anthmerchpowerchordhour@gmail.comCheck out the Power Chord Hour radio show every Friday night at 8 pm est/Tuesday Midnight est on 107.9 WRFA in Jamestown, NY. Stream the station online at wrfalp.com/streaming/ or listen on the WRFA app.Special Thanks to my buddy Jay Vics for the behind the scenes help on this episode!https://www.meettheexpertspodcast.comhttps://www.jvimobile.com

Acquired
Alphabet Inc.

Acquired

Play Episode Listen Later Aug 26, 2025 251:33


In its first six years from 1998 to 2004, Google built one of the greatest products of all time (and certainly the greatest business of all time) with Search. Then in its next six years from 2005 to 2011, Google built seven (!) more billion+ user products: Gmail, Maps, Drive and Docs, YouTube, Chrome, Android, and Photos — all either started from scratch internally or acquired as startups that were still in their infancy. This six-year period of wild innovation STILL stands unmatched in technology history… no other tech company counts more than four billion+ user products in its portfolio total. And of course, this “Google 2.0” era culminated in the transformation of the very company itself into Alphabet.So the question we answer today is… how did they do it?? And why? What was the strategy that led a once “pure play” search company into such far flung fields as email, mapping, funny cat videos and operating systems? We unpack the brilliant (and sometimes accidental) strategies behind each product, the simultaneous three-front war Google fought against Microsoft, Apple, and Facebook, and the spectacular failure of Google Plus that nearly destroyed the company's culture — before ultimately setting the stage for both Alphabet and the AI revolution to come.Sponsors:Many thanks to our fantastic Summer ‘25 Season partners:J.P. Morgan PaymentsAnthropicStatsigVercelLinks:Sign up for email updates and vote on Fall Season episodes!Jeff Dean and Sanjay Ghemawat New Yorker articleEric Schmidt on stage at the iPhone keynote (!)Bill Gurley's classic “Less than Free” Android postOur recent ACQ2 episode with Bret Taylor and Clay BavorWorldly Partners' Multi-Decade Alphabet StudyEpisode sourcesCarve Outs:Bluey x Camp in NYCSteam Deck vs Switch 2 (Part 2)ClaudeSony RX100 VIICarissimi clothingMore Acquired:Get email updates and vote on Fall Season episodes!Join the SlackSubscribe to ACQ2Check out the latest swag in the ACQ Merch Store!‍Note: Acquired hosts and guests may hold assets discussed in this episode. This podcast is not investment advice, and is intended for informational and entertainment purposes only. You should do your own research and make your own independent decisions when considering any financial transactions.

Training Data
LIVE: Google's Jeff Dean on the Coming Transformations in AI

Training Data

Play Episode Listen Later May 16, 2025 30:59


At AI Ascent 2025, Jeff Dean makes bold predictions. Discover how the pioneer behind Google's TPUs and foundational AI research sees the technology evolving, from specialized hardware to more organic systems, and future engineering capabilities.

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

This episode offers a snapshot of the rapidly evolving artificial intelligence landscape on a May 13th 2025, highlighting advancements and strategic shifts across diverse sectors. We see insights into the projected capabilities of AI in fields like software engineering and healthcare, with discussions on AI reaching junior engineer levels and the development of benchmarks to assess its reliability in medical scenarios. Furthermore, the text covers hardware and infrastructure developments, noting Saudi Arabia's moves to secure AI chip supplies from multiple vendors. Innovative applications such as AI-powered battery management and the use of AI to analyse facial photos for biological age and health prediction are also presented. Finally, the sources touch on changes in user interaction, including Apple exploring brain-computer interfaces and Google experimenting with incorporating AI directly into its search interface.

Off The Charts Football Podcast
2025 NFL Draft Recap

Off The Charts Football Podcast

Play Episode Listen Later Apr 29, 2025 49:53


On this episode, Bryce Rossler is joined by Jeff Dean and Jordan Edwards from out Football Operations group as they dive through some of draft weekends biggest topics, like Shedeur Sanders's big fall. They also share their thoughts on some underrated picks, classes they liked the most, and draft day trades like the Lions trading up for Wide receiver Isaac TeSlaa.Off The Charts features a blend of statistical insights, tactical analysis, and personal opinions, aimed at providing listeners with a comprehensive understanding of the week's key matchups and the intricacies of the sport. You can follow our content on Twitter at @Football_SIS, on Bluesky at @sportsinfosis.bsky.social and at sportsinfosolutions.com.

49Karats Podcast
49ers draft MYKEL WILLIAMS in Round 1 – INSTANT Reaction

49Karats Podcast

Play Episode Listen Later Apr 25, 2025 19:47


The San Francisco 49ers just selected Georgia EDGE Mykel Williams in the first round of the 2025 NFL Draft—and I'm joined by Jeff Dean, scouting analyst from Sports Info Spolutions to break it all down!– Live reaction to the 49ers' 1st round pick– Breakdown of the Mykel Williams' fit on the roster– Potential impact in 2025 and beyond– What the pick means for the rest of the draft

Off The Charts Football Podcast
Which Defensive Players Do Our Scouts Like In The NFL Draft?

Off The Charts Football Podcast

Play Episode Listen Later Apr 3, 2025 56:16


On this episode James Weaver and Bryce Rossler from the Sports Info Solutions R&D team and Jeff Dean, Jordan Edwards, and Ben Hrkach from our Football Operations group had an intense discussion on the strengths and weaknesses of 6 of the most prominent defensive prospects in this year's NFL Draft.Mason Graham (DT, Michigan)Abdul Carter (EDGE, Penn State)Will Johnson (CB, Michigan)Jihaad Campbell (WLB, Alabama)Derrick Harmon (DT, Oregon)Shemar Stewart (DE, Texas A&M)You can find scouting reports for the draft's top prospects at NFLDraft.SportsInfoSolutions.com. New reports are being added regularly.Off The Charts features a blend of statistical insights, tactical analysis, and personal opinions, aimed at providing listeners with a comprehensive understanding of the week's key matchups and the intricacies of the sport. You can follow our content on Twitter at @Football_SIS, on Bluesky at @sportsinfosis.bsky.social and at sportsinfosolutions.com.

The Lunar Society
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

The Lunar Society

Play Episode Listen Later Feb 12, 2025 134:43


This week I welcome on the show two of the most important technologists ever, in any field.Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has worked on basically the most transformative systems in modern computing: from MapReduce, BigTable, Tensorflow, AlphaChip, to Gemini.Noam Shazeer invented or co-invented all the main architectures and techniques that are used for modern LLMs: from the Transformer itself, to Mixture of Experts, to Mesh Tensorflow, to Gemini and many other things.We talk about their 25 years at Google, going from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and maybe soon to ASI.My favorite part was Jeff's vision for Pathways, Google's grand plan for a mutually-reinforcing loop of hardware and algorithmic design and for going past autoregression. That culminates in us imagining *all* of Google-the-company, going through one huge MoE model.And Noam just bites every bullet: 100x world GDP soon; let's get a million automated researchers running in the Google datacenter; living to see the year 3000.SponsorsScale partners with major AI labs like Meta, Google Deepmind, and OpenAI. Through Scale's Data Foundry, labs get access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you're an AI researcher or engineer, learn about how Scale's Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh.Curious how Jane Street teaches their new traders? They use Figgie, a rapid-fire card game that simulates the most exciting parts of markets and trading. It's become so popular that Jane Street hosts an inter-office Figgie championship every year. Download from the app store or play on your desktop at figgie.com.Meter wants to radically improve the digital world we take for granted. They're developing a foundation model that automates network management end-to-end. To do this, they just announced a long-term partnership with Microsoft for tens of thousands of GPUs, and they're recruiting a world class AI research team. To learn more, go to meter.com/dwarkesh.Advertisers:To sponsor a future episode, visit: dwarkeshpatel.com/p/advertise.Timestamps00:00:00 - Intro00:02:44 - Joining Google in 199900:05:36 - Future of Moore's Law00:10:21 - Future TPUs00:13:13 - Jeff's undergrad thesis: parallel backprop00:15:10 - LLMs in 200700:23:07 - “Holy s**t” moments00:29:46 - AI fulfills Google's original mission00:34:19 - Doing Search in-context00:38:32 - The internal coding model00:39:49 - What will 2027 models do?00:46:00 - A new architecture every day?00:49:21 - Automated chip design and intelligence explosion00:57:31 - Future of inference scaling01:03:56 - Already doing multi-datacenter runs01:22:33 - Debugging at scale01:26:05 - Fast takeoff and superalignment01:34:40 - A million evil Jeff Deans01:38:16 - Fun times at Google01:41:50 - World compute demand in 203001:48:21 - Getting back to modularity01:59:13 - Keeping a giga-MoE in-memory02:04:09 - All of Google in one model02:12:43 - What's missing from distillation02:18:03 - Open research, pros and cons02:24:54 - Going the distance Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World's Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx: So that means you're saying Sam Altman's wrong. [00:07:00] You're saying, um, you know, everyone else is wrong. It helps that this was the day before Ilya went on, went up on stage and then said pre training has hit a wall. And data has hit a wall. So actually Jonathan ended up winning, and then Ilya supported that statement, and then Noam Brown on the last day further supported that statement as well.[00:07:17] swyx: So it's kind of interesting that I think the consensus kind of going in was that we're not done scaling, like you should believe in a better lesson. And then, four straight days in a row, you had Sepp Hochreiter, who is the creator of the LSTM, along with everyone's favorite OG in AI, which is Juergen Schmidhuber.[00:07:34] swyx: He said that, um, we're pre trading inside a wall, or like, we've run into a different kind of wall. And then we have, you know John Frankel, Ilya, and then Noam Brown are all saying variations of the same thing, that we have hit some kind of wall in the status quo of what pre trained, scaling large pre trained models has looked like, and we need a new thing.[00:07:54] swyx: And obviously the new thing for people is some make, either people are calling it inference time compute or test time [00:08:00] compute. I think the collective terminology has been inference time, and I think that makes sense because test time, calling it test, meaning, has a very pre trained bias, meaning that the only reason for running inference at all is to test your model.[00:08:11] swyx: That is not true. Right. Yeah. So, so, I quite agree that. OpenAI seems to have adopted, or the community seems to have adopted this terminology of ITC instead of TTC. And that, that makes a lot of sense because like now we care about inference, even right down to compute optimality. Like I actually interviewed this author who recovered or reviewed the Chinchilla paper.[00:08:31] swyx: Chinchilla paper is compute optimal training, but what is not stated in there is it's pre trained compute optimal training. And once you start caring about inference, compute optimal training, you have a different scaling law. And in a way that we did not know last year.[00:08:45] Alessio: I wonder, because John is, he's also on the side of attention is all you need.[00:08:49] Alessio: Like he had the bet with Sasha. So I'm curious, like he doesn't believe in scaling, but he thinks the transformer, I wonder if he's still. So, so,[00:08:56] swyx: so he, obviously everything is nuanced and you know, I told him to play a character [00:09:00] for this debate, right? So he actually does. Yeah. He still, he still believes that we can scale more.[00:09:04] swyx: Uh, he just assumed the character to be very game for, for playing this debate. So even more kudos to him that he assumed a position that he didn't believe in and still won the debate.[00:09:16] Alessio: Get rekt, Dylan. Um, do you just want to quickly run through some of these things? Like, uh, Sarah's presentation, just the highlights.[00:09:24] swyx: Yeah, we can't go through everyone's slides, but I pulled out some things as a factor of, like, stuff that we were going to talk about. And we'll[00:09:30] Alessio: publish[00:09:31] swyx: the rest. Yeah, we'll publish on this feed the best of 2024 in those domains. And hopefully people can benefit from the work that our speakers have done.[00:09:39] swyx: But I think it's, uh, these are just good slides. And I've been, I've been looking for a sort of end of year recaps from, from people.[00:09:44] The Competitive AI Landscape[00:09:44] swyx: The field has progressed a lot. You know, I think the max ELO in 2023 on LMSys used to be 1200 for LMSys ELOs. And now everyone is at least at, uh, 1275 in their ELOs, and this is across Gemini, Chadjibuti, [00:10:00] Grok, O1.[00:10:01] swyx: ai, which with their E Large model, and Enthopic, of course. It's a very, very competitive race. There are multiple Frontier labs all racing, but there is a clear tier zero Frontier. And then there's like a tier one. It's like, I wish I had everything else. Tier zero is extremely competitive. It's effectively now three horse race between Gemini, uh, Anthropic and OpenAI.[00:10:21] swyx: I would say that people are still holding out a candle for XAI. XAI, I think, for some reason, because their API was very slow to roll out, is not included in these metrics. So it's actually quite hard to put on there. As someone who also does charts, XAI is continually snubbed because they don't work well with the benchmarking people.[00:10:42] swyx: Yeah, yeah, yeah. It's a little trivia for why XAI always gets ignored. The other thing is market share. So these are slides from Sarah. We have it up on the screen. It has gone from very heavily open AI. So we have some numbers and estimates. These are from RAMP. Estimates of open AI market share in [00:11:00] December 2023.[00:11:01] swyx: And this is basically, what is it, GPT being 95 percent of production traffic. And I think if you correlate that with stuff that we asked. Harrison Chase on the LangChain episode, it was true. And then CLAUD 3 launched mid middle of this year. I think CLAUD 3 launched in March, CLAUD 3. 5 Sonnet was in June ish.[00:11:23] swyx: And you can start seeing the market share shift towards opening, uh, towards that topic, uh, very, very aggressively. The more recent one is Gemini. So if I scroll down a little bit, this is an even more recent dataset. So RAM's dataset ends in September 2 2. 2024. Gemini has basically launched a price war at the low end, uh, with Gemini Flash, uh, being basically free for personal use.[00:11:44] swyx: Like, I think people don't understand the free tier. It's something like a billion tokens per day. Unless you're trying to abuse it, you cannot really exhaust your free tier on Gemini. They're really trying to get you to use it. They know they're in like third place, um, fourth place, depending how you, how you count.[00:11:58] swyx: And so they're going after [00:12:00] the Lower tier first, and then, you know, maybe the upper tier later, but yeah, Gemini Flash, according to OpenRouter, is now 50 percent of their OpenRouter requests. Obviously, these are the small requests. These are small, cheap requests that are mathematically going to be more.[00:12:15] swyx: The smart ones obviously are still going to OpenAI. But, you know, it's a very, very big shift in the market. Like basically 2023, 2022, To going into 2024 opening has gone from nine five market share to Yeah. Reasonably somewhere between 50 to 75 market share.[00:12:29] Alessio: Yeah. I'm really curious how ramped does the attribution to the model?[00:12:32] Alessio: If it's API, because I think it's all credit card spin. . Well, but it's all, the credit card doesn't say maybe. Maybe the, maybe when they do expenses, they upload the PDF, but yeah, the, the German I think makes sense. I think that was one of my main 2024 takeaways that like. The best small model companies are the large labs, which is not something I would have thought that the open source kind of like long tail would be like the small model.[00:12:53] swyx: Yeah, different sizes of small models we're talking about here, right? Like so small model here for Gemini is AB, [00:13:00] right? Uh, mini. We don't know what the small model size is, but yeah, it's probably in the double digits or maybe single digits, but probably double digits. The open source community has kind of focused on the one to three B size.[00:13:11] swyx: Mm-hmm . Yeah. Maybe[00:13:12] swyx: zero, maybe 0.5 B uh, that's moon dream and that is small for you then, then that's great. It makes sense that we, we have a range for small now, which is like, may, maybe one to five B. Yeah. I'll even put that at, at, at the high end. And so this includes Gemma from Gemini as well. But also includes the Apple Foundation models, which I think Apple Foundation is 3B.[00:13:32] Alessio: Yeah. No, that's great. I mean, I think in the start small just meant cheap. I think today small is actually a more nuanced discussion, you know, that people weren't really having before.[00:13:43] swyx: Yeah, we can keep going. This is a slide that I smiley disagree with Sarah. She's pointing to the scale SEAL leaderboard. I think the Researchers that I talked with at NeurIPS were kind of positive on this because basically you need private test [00:14:00] sets to prevent contamination.[00:14:02] swyx: And Scale is one of maybe three or four people this year that has really made an effort in doing a credible private test set leaderboard. Llama405B does well compared to Gemini and GPT 40. And I think that's good. I would say that. You know, it's good to have an open model that is that big, that does well on those metrics.[00:14:23] swyx: But anyone putting 405B in production will tell you, if you scroll down a little bit to the artificial analysis numbers, that it is very slow and very expensive to infer. Um, it doesn't even fit on like one node. of, uh, of H100s. Cerebras will be happy to tell you they can serve 4 or 5B on their super large chips.[00:14:42] swyx: But, um, you know, if you need to do anything custom to it, you're still kind of constrained. So, is 4 or 5B really that relevant? Like, I think most people are basically saying that they only use 4 or 5B as a teacher model to distill down to something. Even Meta is doing it. So with Lama 3. [00:15:00] 3 launched, they only launched the 70B because they use 4 or 5B to distill the 70B.[00:15:03] swyx: So I don't know if like open source is keeping up. I think they're the, the open source industrial complex is very invested in telling you that the, if the gap is narrowing, I kind of disagree. I think that the gap is widening with O1. I think there are very, very smart people trying to narrow that gap and they should.[00:15:22] swyx: I really wish them success, but you cannot use a chart that is nearing 100 in your saturation chart. And look, the distance between open source and closed source is narrowing. Of course it's going to narrow because you're near 100. This is stupid. But in metrics that matter, is open source narrowing?[00:15:38] swyx: Probably not for O1 for a while. And it's really up to the open source guys to figure out if they can match O1 or not.[00:15:46] Alessio: I think inference time compute is bad for open source just because, you know, Doc can donate the flops at training time, but he cannot donate the flops at inference time. So it's really hard to like actually keep up on that axis.[00:15:59] Alessio: Big, big business [00:16:00] model shift. So I don't know what that means for the GPU clouds. I don't know what that means for the hyperscalers, but obviously the big labs have a lot of advantage. Because, like, it's not a static artifact that you're putting the compute in. You're kind of doing that still, but then you're putting a lot of computed inference too.[00:16:17] swyx: Yeah, yeah, yeah. Um, I mean, Llama4 will be reasoning oriented. We talked with Thomas Shalom. Um, kudos for getting that episode together. That was really nice. Good, well timed. Actually, I connected with the AI meta guy, uh, at NeurIPS, and, um, yeah, we're going to coordinate something for Llama4. Yeah, yeah,[00:16:32] Alessio: and our friend, yeah.[00:16:33] Alessio: Clara Shi just joined to lead the business agent side. So I'm sure we'll have her on in the new year.[00:16:39] swyx: Yeah. So, um, my comment on, on the business model shift, this is super interesting. Apparently it is wide knowledge that OpenAI wanted more than 6. 6 billion dollars for their fundraise. They wanted to raise, you know, higher, and they did not.[00:16:51] swyx: And what that means is basically like, it's very convenient that we're not getting GPT 5, which would have been a larger pre train. We should have a lot of upfront money. And [00:17:00] instead we're, we're converting fixed costs into variable costs, right. And passing it on effectively to the customer. And it's so much easier to take margin there because you can directly attribute it to like, Oh, you're using this more.[00:17:12] swyx: Therefore you, you pay more of the cost and I'll just slap a margin in there. So like that lets you control your growth margin and like tie your. Your spend, or your sort of inference spend, accordingly. And it's just really interesting to, that this change in the sort of inference paradigm has arrived exactly at the same time that the funding environment for pre training is effectively drying up, kind of.[00:17:36] swyx: I feel like maybe the VCs are very in tune with research anyway, so like, they would have noticed this, but, um, it's just interesting.[00:17:43] Alessio: Yeah, and I was looking back at our yearly recap of last year. Yeah. And the big thing was like the mixed trial price fights, you know, and I think now it's almost like there's nowhere to go, like, you know, Gemini Flash is like basically giving it away for free.[00:17:55] Alessio: So I think this is a good way for the labs to generate more revenue and pass down [00:18:00] some of the compute to the customer. I think they're going to[00:18:02] swyx: keep going. I think that 2, will come.[00:18:05] Alessio: Yeah, I know. Totally. I mean, next year, the first thing I'm doing is signing up for Devin. Signing up for the pro chat GBT.[00:18:12] Alessio: Just to try. I just want to see what does it look like to spend a thousand dollars a month on AI?[00:18:17] swyx: Yes. Yes. I think if your, if your, your job is a, at least AI content creator or VC or, you know, someone who, whose job it is to stay on, stay on top of things, you should already be spending like a thousand dollars a month on, on stuff.[00:18:28] swyx: And then obviously easy to spend, hard to use. You have to actually use. The good thing is that actually Google lets you do a lot of stuff for free now. So like deep research. That they just launched. Uses a ton of inference and it's, it's free while it's in preview.[00:18:45] Alessio: Yeah. They need to put that in Lindy.[00:18:47] Alessio: I've been using Lindy lately. I've been a built a bunch of things once we had flow because I liked the new thing. It's pretty good. I even did a phone call assistant. Um, yeah, they just launched Lindy voice. Yeah, I think once [00:19:00] they get advanced voice mode like capability today, still like speech to text, you can kind of tell.[00:19:06] Alessio: Um, but it's good for like reservations and things like that. So I have a meeting prepper thing. And so[00:19:13] swyx: it's good. Okay. I feel like we've, we've covered a lot of stuff. Uh, I, yeah, I, you know, I think We will go over the individual, uh, talks in a separate episode. Uh, I don't want to take too much time with, uh, this stuff, but that suffice to say that there is a lot of progress in each field.[00:19:28] swyx: Uh, we covered vision. Basically this is all like the audience voting for what they wanted. And then I just invited the best people I could find in each audience, especially agents. Um, Graham, who I talked to at ICML in Vienna, he is currently still number one. It's very hard to stay on top of SweetBench.[00:19:45] swyx: OpenHand is currently still number one. switchbench full, which is the hardest one. He had very good thoughts on agents, which I, which I'll highlight for people. Everyone is saying 2025 is the year of agents, just like they said last year. And, uh, but he had [00:20:00] thoughts on like eight parts of what are the frontier problems to solve in agents.[00:20:03] swyx: And so I'll highlight that talk as well.[00:20:05] Alessio: Yeah. The number six, which is the Hacken agents learn more about the environment, has been a Super interesting to us as well, just to think through, because, yeah, how do you put an agent in an enterprise where most things in an enterprise have never been public, you know, a lot of the tooling, like the code bases and things like that.[00:20:23] Alessio: So, yeah, there's not indexing and reg. Well, yeah, but it's more like. You can't really rag things that are not documented. But people know them based on how they've been doing it. You know, so I think there's almost this like, you know, Oh, institutional knowledge. Yeah, the boring word is kind of like a business process extraction.[00:20:38] Alessio: Yeah yeah, I see. It's like, how do you actually understand how these things are done? I see. Um, and I think today the, the problem is that, Yeah, the agents are, that most people are building are good at following instruction, but are not as good as like extracting them from you. Um, so I think that will be a big unlock just to touch quickly on the Jeff Dean thing.[00:20:55] Alessio: I thought it was pretty, I mean, we'll link it in the, in the things, but. I think the main [00:21:00] focus was like, how do you use ML to optimize the systems instead of just focusing on ML to do something else? Yeah, I think speculative decoding, we had, you know, Eugene from RWKB on the podcast before, like he's doing a lot of that with Fetterless AI.[00:21:12] swyx: Everyone is. I would say it's the norm. I'm a little bit uncomfortable with how much it costs, because it does use more of the GPU per call. But because everyone is so keen on fast inference, then yeah, makes sense.[00:21:24] Alessio: Exactly. Um, yeah, but we'll link that. Obviously Jeff is great.[00:21:30] swyx: Jeff is, Jeff's talk was more, it wasn't focused on Gemini.[00:21:33] swyx: I think people got the wrong impression from my tweet. It's more about how Google approaches ML and uses ML to design systems and then systems feedback into ML. And I think this ties in with Lubna's talk.[00:21:45] Synthetic Data and Future Trends[00:21:45] swyx: on synthetic data where it's basically the story of bootstrapping of humans and AI in AI research or AI in production.[00:21:53] swyx: So her talk was on synthetic data, where like how much synthetic data has grown in 2024 in the pre training side, the post training side, [00:22:00] and the eval side. And I think Jeff then also extended it basically to chips, uh, to chip design. So he'd spend a lot of time talking about alpha chip. And most of us in the audience are like, we're not working on hardware, man.[00:22:11] swyx: Like you guys are great. TPU is great. Okay. We'll buy TPUs.[00:22:14] Alessio: And then there was the earlier talk. Yeah. But, and then we have, uh, I don't know if we're calling them essays. What are we calling these? But[00:22:23] swyx: for me, it's just like bonus for late in space supporters, because I feel like they haven't been getting anything.[00:22:29] swyx: And then I wanted a more high frequency way to write stuff. Like that one I wrote in an afternoon. I think basically we now have an answer to what Ilya saw. It's one year since. The blip. And we know what he saw in 2014. We know what he saw in 2024. We think we know what he sees in 2024. He gave some hints and then we have vague indications of what he saw in 2023.[00:22:54] swyx: So that was the Oh, and then 2016 as well, because of this lawsuit with Elon, OpenAI [00:23:00] is publishing emails from Sam's, like, his personal text messages to Siobhan, Zelis, or whatever. So, like, we have emails from Ilya saying, this is what we're seeing in OpenAI, and this is why we need to scale up GPUs. And I think it's very prescient in 2016 to write that.[00:23:16] swyx: And so, like, it is exactly, like, basically his insights. It's him and Greg, basically just kind of driving the scaling up of OpenAI, while they're still playing Dota. They're like, no, like, we see the path here.[00:23:30] Alessio: Yeah, and it's funny, yeah, they even mention, you know, we can only train on 1v1 Dota. We need to train on 5v5, and that takes too many GPUs.[00:23:37] Alessio: Yeah,[00:23:37] swyx: and at least for me, I can speak for myself, like, I didn't see the path from Dota to where we are today. I think even, maybe if you ask them, like, they wouldn't necessarily draw a straight line. Yeah,[00:23:47] Alessio: no, definitely. But I think like that was like the whole idea of almost like the RL and we talked about this with Nathan on his podcast.[00:23:55] Alessio: It's like with RL, you can get very good at specific things, but then you can't really like generalize as much. And I [00:24:00] think the language models are like the opposite, which is like, you're going to throw all this data at them and scale them up, but then you really need to drive them home on a specific task later on.[00:24:08] Alessio: And we'll talk about the open AI reinforcement, fine tuning, um, announcement too, and all of that. But yeah, I think like scale is all you need. That's kind of what Elia will be remembered for. And I think just maybe to clarify on like the pre training is over thing that people love to tweet. I think the point of the talk was like everybody, we're scaling these chips, we're scaling the compute, but like the second ingredient which is data is not scaling at the same rate.[00:24:35] Alessio: So it's not necessarily pre training is over. It's kind of like What got us here won't get us there. In his email, he predicted like 10x growth every two years or something like that. And I think maybe now it's like, you know, you can 10x the chips again, but[00:24:49] swyx: I think it's 10x per year. Was it? I don't know.[00:24:52] Alessio: Exactly. And Moore's law is like 2x. So it's like, you know, much faster than that. And yeah, I like the fossil fuel of AI [00:25:00] analogy. It's kind of like, you know, the little background tokens thing. So the OpenAI reinforcement fine tuning is basically like, instead of fine tuning on data, you fine tune on a reward model.[00:25:09] Alessio: So it's basically like, instead of being data driven, it's like task driven. And I think people have tasks to do, they don't really have a lot of data. So I'm curious to see how that changes, how many people fine tune, because I think this is what people run into. It's like, Oh, you can fine tune llama. And it's like, okay, where do I get the data?[00:25:27] Alessio: To fine tune it on, you know, so it's great that we're moving the thing. And then I really like he had this chart where like, you know, the brain mass and the body mass thing is basically like mammals that scaled linearly by brain and body size, and then humans kind of like broke off the slope. So it's almost like maybe the mammal slope is like the pre training slope.[00:25:46] Alessio: And then the post training slope is like the, the human one.[00:25:49] swyx: Yeah. I wonder what the. I mean, we'll know in 10 years, but I wonder what the y axis is for, for Ilya's SSI. We'll try to get them on.[00:25:57] Alessio: Ilya, if you're listening, you're [00:26:00] welcome here. Yeah, and then he had, you know, what comes next, like agent, synthetic data, inference, compute, I thought all of that was like that.[00:26:05] Alessio: I don't[00:26:05] swyx: think he was dropping any alpha there. Yeah, yeah, yeah.[00:26:07] Alessio: Yeah. Any other new reps? Highlights?[00:26:10] swyx: I think that there was comparatively a lot more work. Oh, by the way, I need to plug that, uh, my friend Yi made this, like, little nice paper. Yeah, that was really[00:26:20] swyx: nice.[00:26:20] swyx: Uh, of, uh, of, like, all the, he's, she called it must read papers of 2024.[00:26:26] swyx: So I laid out some of these at NeurIPS, and it was just gone. Like, everyone just picked it up. Because people are dying for, like, little guidance and visualizations And so, uh, I thought it was really super nice that we got there.[00:26:38] Alessio: Should we do a late in space book for each year? Uh, I thought about it. For each year we should.[00:26:42] Alessio: Coffee table book. Yeah. Yeah. Okay. Put it in the will. Hi, Will. By the way, we haven't introduced you. He's our new, you know, general organist, Jamie. You need to[00:26:52] swyx: pull up more things. One thing I saw that, uh, Okay, one fun one, and then one [00:27:00] more general one. So the fun one is this paper on agent collusion. This is a paper on steganography.[00:27:06] swyx: This is secret collusion among AI agents, multi agent deception via steganography. I tried to go to NeurIPS in order to find these kinds of papers because the real reason Like NeurIPS this year has a lottery system. A lot of people actually even go and don't buy tickets because they just go and attend the side events.[00:27:22] swyx: And then also the people who go and end up crowding around the most popular papers, which you already know and already read them before you showed up to NeurIPS. So the only reason you go there is to talk to the paper authors, but there's like something like 10, 000 other. All these papers out there that, you know, are just people's work that they, that they did on the air and they failed to get attention for one reason or another.[00:27:42] swyx: And this was one of them. Uh, it was like all the way at the back. And this is a deep mind paper that actually focuses on collusion between AI agents, uh, by hiding messages in the text that they generate. Uh, so that's what steganography is. So a very simple example would be the first letter of every word.[00:27:57] swyx: If you Pick that out, you know, and the code sends a [00:28:00] different message than that. But something I've always emphasized is to LLMs, we read left to right. LLMs can read up, down, sideways, you know, in random character order. And it's the same to them as it is to us. So if we were ever to get You know, self motivated, underlined LLMs that we're trying to collaborate to take over the planet.[00:28:19] swyx: This would be how they do it. They spread messages among us in the messages that we generate. And he developed a scaling law for that. So he marked, I'm showing it on screen right now, the emergence of this phenomenon. Basically, for example, for Cypher encoding, GPT 2, Lama 2, mixed trial, GPT 3. 5, zero capabilities, and sudden 4.[00:28:40] swyx: And this is the kind of Jason Wei type emergence properties that people kind of look for. I think what made this paper stand out as well, so he developed the benchmark for steganography collusion, and he also focused on shelling point collusion, which is very low coordination. For agreeing on a decoding encoding format, you kind of need to have some [00:29:00] agreement on that.[00:29:00] swyx: But, but shelling point means like very, very low or almost no coordination. So for example, if I, if I ask someone, if the only message I give you is meet me in New York and you're not aware. Or when you would probably meet me at Grand Central Station. That is the Grand Central Station is a shelling point.[00:29:16] swyx: And it's probably somewhere, somewhere during the day. That is the shelling point of New York is Grand Central. To that extent, shelling points for steganography are things like the, the, the common decoding methods that we talked about. It will be interesting at some point in the future when we are worried about alignment.[00:29:30] swyx: It is not interesting today, but it's interesting that DeepMind is already thinking about this.[00:29:36] Alessio: I think that's like one of the hardest things about NeurIPS. It's like the long tail. I[00:29:41] swyx: found a pricing guy. I'm going to feature him on the podcast. Basically, this guy from NVIDIA worked out the optimal pricing for language models.[00:29:51] swyx: It's basically an econometrics paper at NeurIPS, where everyone else is talking about GPUs. And the guy with the GPUs is[00:29:57] Alessio: talking[00:29:57] swyx: about economics instead. [00:30:00] That was the sort of fun one. So the focus I saw is that model papers at NeurIPS are kind of dead. No one really presents models anymore. It's just data sets.[00:30:12] swyx: This is all the grad students are working on. So like there was a data sets track and then I was looking around like, I was like, you don't need a data sets track because every paper is a data sets paper. And so data sets and benchmarks, they're kind of flip sides of the same thing. So Yeah. Cool. Yeah, if you're a grad student, you're a GPU boy, you kind of work on that.[00:30:30] swyx: And then the, the sort of big model that people walk around and pick the ones that they like, and then they use it in their models. And that's, that's kind of how it develops. I, I feel like, um, like, like you didn't last year, you had people like Hao Tian who worked on Lava, which is take Lama and add Vision.[00:30:47] swyx: And then obviously actually I hired him and he added Vision to Grok. Now he's the Vision Grok guy. This year, I don't think there was any of those.[00:30:55] Alessio: What were the most popular, like, orals? Last year it was like the [00:31:00] Mixed Monarch, I think, was like the most attended. Yeah, uh, I need to look it up. Yeah, I mean, if nothing comes to mind, that's also kind of like an answer in a way.[00:31:10] Alessio: But I think last year there was a lot of interest in, like, furthering models and, like, different architectures and all of that.[00:31:16] swyx: I will say that I felt the orals, oral picks this year were not very good. Either that or maybe it's just a So that's the highlight of how I have changed in terms of how I view papers.[00:31:29] swyx: So like, in my estimation, two of the best papers in this year for datasets or data comp and refined web or fine web. These are two actually industrially used papers, not highlighted for a while. I think DCLM got the spotlight, FineWeb didn't even get the spotlight. So like, it's just that the picks were different.[00:31:48] swyx: But one thing that does get a lot of play that a lot of people are debating is the role that's scheduled. This is the schedule free optimizer paper from Meta from Aaron DeFazio. And this [00:32:00] year in the ML community, there's been a lot of chat about shampoo, soap, all the bathroom amenities for optimizing your learning rates.[00:32:08] swyx: And, uh, most people at the big labs are. Who I asked about this, um, say that it's cute, but it's not something that matters. I don't know, but it's something that was discussed and very, very popular. 4Wars[00:32:19] Alessio: of AI recap maybe, just quickly. Um, where do you want to start? Data?[00:32:26] swyx: So to remind people, this is the 4Wars piece that we did as one of our earlier recaps of this year.[00:32:31] swyx: And the belligerents are on the left, journalists, writers, artists, anyone who owns IP basically, New York Times, Stack Overflow, Reddit, Getty, Sarah Silverman, George RR Martin. Yeah, and I think this year we can add Scarlett Johansson to that side of the fence. So anyone suing, open the eye, basically. I actually wanted to get a snapshot of all the lawsuits.[00:32:52] swyx: I'm sure some lawyer can do it. That's the data quality war. On the right hand side, we have the synthetic data people, and I think we talked about Lumna's talk, you know, [00:33:00] really showing how much synthetic data has come along this year. I think there was a bit of a fight between scale. ai and the synthetic data community, because scale.[00:33:09] swyx: ai published a paper saying that synthetic data doesn't work. Surprise, surprise, scale. ai is the leading vendor of non synthetic data. Only[00:33:17] Alessio: cage free annotated data is useful.[00:33:21] swyx: So I think there's some debate going on there, but I don't think it's much debate anymore that at least synthetic data, for the reasons that are blessed in Luna's talk, Makes sense.[00:33:32] swyx: I don't know if you have any perspectives there.[00:33:34] Alessio: I think, again, going back to the reinforcement fine tuning, I think that will change a little bit how people think about it. I think today people mostly use synthetic data, yeah, for distillation and kind of like fine tuning a smaller model from like a larger model.[00:33:46] Alessio: I'm not super aware of how the frontier labs use it outside of like the rephrase, the web thing that Apple also did. But yeah, I think it'll be. Useful. I think like whether or not that gets us the big [00:34:00] next step, I think that's maybe like TBD, you know, I think people love talking about data because it's like a GPU poor, you know, I think, uh, synthetic data is like something that people can do, you know, so they feel more opinionated about it compared to, yeah, the optimizers stuff, which is like,[00:34:17] swyx: they don't[00:34:17] Alessio: really work[00:34:18] swyx: on.[00:34:18] swyx: I think that there is an angle to the reasoning synthetic data. So this year, we covered in the paper club, the star series of papers. So that's star, Q star, V star. It basically helps you to synthesize reasoning steps, or at least distill reasoning steps from a verifier. And if you look at the OpenAI RFT, API that they released, or that they announced, basically they're asking you to submit graders, or they choose from a preset list of graders.[00:34:49] swyx: Basically It feels like a way to create valid synthetic data for them to fine tune their reasoning paths on. Um, so I think that is another angle where it starts to make sense. And [00:35:00] so like, it's very funny that basically all the data quality wars between Let's say the music industry or like the newspaper publishing industry or the textbooks industry on the big labs.[00:35:11] swyx: It's all of the pre training era. And then like the new era, like the reasoning era, like nobody has any problem with all the reasoning, especially because it's all like sort of math and science oriented with, with very reasonable graders. I think the more interesting next step is how does it generalize beyond STEM?[00:35:27] swyx: We've been using O1 for And I would say like for summarization and creative writing and instruction following, I think it's underrated. I started using O1 in our intro songs before we killed the intro songs, but it's very good at writing lyrics. You know, I can actually say like, I think one of the O1 pro demos.[00:35:46] swyx: All of these things that Noam was showing was that, you know, you can write an entire paragraph or three paragraphs without using the letter A, right?[00:35:53] Creative Writing with AI[00:35:53] swyx: So like, like literally just anything instead of token, like not even token level, character level manipulation and [00:36:00] counting and instruction following. It's, uh, it's very, very strong.[00:36:02] swyx: And so no surprises when I ask it to rhyme, uh, and to, to create song lyrics, it's going to do that very much better than in previous models. So I think it's underrated for creative writing.[00:36:11] Alessio: Yeah.[00:36:12] Legal and Ethical Issues in AI[00:36:12] Alessio: What do you think is the rationale that they're going to have in court when they don't show you the thinking traces of O1, but then they want us to, like, they're getting sued for using other publishers data, you know, but then on their end, they're like, well, you shouldn't be using my data to then train your model.[00:36:29] Alessio: So I'm curious to see how that kind of comes. Yeah, I mean, OPA has[00:36:32] swyx: many ways to publish, to punish people without bringing, taking them to court. Already banned ByteDance for distilling their, their info. And so anyone caught distilling the chain of thought will be just disallowed to continue on, on, on the API.[00:36:44] swyx: And it's fine. It's no big deal. Like, I don't even think that's an issue at all, just because the chain of thoughts are pretty well hidden. Like you have to work very, very hard to, to get it to leak. And then even when it leaks the chain of thought, you don't know if it's, if it's [00:37:00] The bigger concern is actually that there's not that much IP hiding behind it, that Cosign, which we talked about, we talked to him on Dev Day, can just fine tune 4.[00:37:13] swyx: 0 to beat 0. 1 Cloud SONET so far is beating O1 on coding tasks without, at least O1 preview, without being a reasoning model, same for Gemini Pro or Gemini 2. 0. So like, how much is reasoning important? How much of a moat is there in this, like, All of these are proprietary sort of training data that they've presumably accomplished.[00:37:34] swyx: Because even DeepSeek was able to do it. And they had, you know, two months notice to do this, to do R1. So, it's actually unclear how much moat there is. Obviously, you know, if you talk to the Strawberry team, they'll be like, yeah, I mean, we spent the last two years doing this. So, we don't know. And it's going to be Interesting because there'll be a lot of noise from people who say they have inference time compute and actually don't because they just have fancy chain of thought.[00:38:00][00:38:00] swyx: And then there's other people who actually do have very good chain of thought. And you will not see them on the same level as OpenAI because OpenAI has invested a lot in building up the mythology of their team. Um, which makes sense. Like the real answer is somewhere in between.[00:38:13] Alessio: Yeah, I think that's kind of like the main data war story developing.[00:38:18] The Data War: GPU Poor vs. GPU Rich[00:38:18] Alessio: GPU poor versus GPU rich. Yeah. Where do you think we are? I think there was, again, going back to like the small model thing, there was like a time in which the GPU poor were kind of like the rebel faction working on like these models that were like open and small and cheap. And I think today people don't really care as much about GPUs anymore.[00:38:37] Alessio: You also see it in the price of the GPUs. Like, you know, that market is kind of like plummeted because there's people don't want to be, they want to be GPU free. They don't even want to be poor. They just want to be, you know, completely without them. Yeah. How do you think about this war? You[00:38:52] swyx: can tell me about this, but like, I feel like the, the appetite for GPU rich startups, like the, you know, the, the funding plan is we will raise 60 million and [00:39:00] we'll give 50 of that to NVIDIA.[00:39:01] swyx: That is gone, right? Like, no one's, no one's pitching that. This was literally the plan, the exact plan of like, I can name like four or five startups, you know, this time last year. So yeah, GPU rich startups gone.[00:39:12] The Rise of GPU Ultra Rich[00:39:12] swyx: But I think like, The GPU ultra rich, the GPU ultra high net worth is still going. So, um, now we're, you know, we had Leopold's essay on the trillion dollar cluster.[00:39:23] swyx: We're not quite there yet. We have multiple labs, um, you know, XAI very famously, you know, Jensen Huang praising them for being. Best boy number one in spinning up 100, 000 GPU cluster in like 12 days or something. So likewise at Meta, likewise at OpenAI, likewise at the other labs as well. So like the GPU ultra rich are going to keep doing that because I think partially it's an article of faith now that you just need it.[00:39:46] swyx: Like you don't even know what it's going to, what you're going to use it for. You just, you just need it. And it makes sense that if, especially if we're going into. More researchy territory than we are. So let's say 2020 to 2023 was [00:40:00] let's scale big models territory because we had GPT 3 in 2020 and we were like, okay, we'll go from 1.[00:40:05] swyx: 75b to 1. 8b, 1. 8t. And that was GPT 3 to GPT 4. Okay, that's done. As far as everyone is concerned, Opus 3. 5 is not coming out, GPT 4. 5 is not coming out, and Gemini 2, we don't have Pro, whatever. We've hit that wall. Maybe I'll call it the 2 trillion perimeter wall. We're not going to 10 trillion. No one thinks it's a good idea, at least from training costs, from the amount of data, or at least the inference.[00:40:36] swyx: Would you pay 10x the price of GPT Probably not. Like, like you want something else that, that is at least more useful. So it makes sense that people are pivoting in terms of their inference paradigm.[00:40:47] Emerging Trends in AI Models[00:40:47] swyx: And so when it's more researchy, then you actually need more just general purpose compute to mess around with, uh, at the exact same time that production deployments of the old, the previous paradigm is still ramping up,[00:40:58] swyx: um,[00:40:58] swyx: uh, pretty aggressively.[00:40:59] swyx: So [00:41:00] it makes sense that the GPU rich are growing. We have now interviewed both together and fireworks and replicates. Uh, we haven't done any scale yet. But I think Amazon, maybe kind of a sleeper one, Amazon, in a sense of like they, at reInvent, I wasn't expecting them to do so well, but they are now a foundation model lab.[00:41:18] swyx: It's kind of interesting. Um, I think, uh, you know, David went over there and started just creating models.[00:41:25] Alessio: Yeah, I mean, that's the power of prepaid contracts. I think like a lot of AWS customers, you know, they do this big reserve instance contracts and now they got to use their money. That's why so many startups.[00:41:37] Alessio: Get bought through the AWS marketplace so they can kind of bundle them together and prefer pricing.[00:41:42] swyx: Okay, so maybe GPU super rich doing very well, GPU middle class dead, and then GPU[00:41:48] Alessio: poor. I mean, my thing is like, everybody should just be GPU rich. There shouldn't really be, even the GPU poorest, it's like, does it really make sense to be GPU poor?[00:41:57] Alessio: Like, if you're GPU poor, you should just use the [00:42:00] cloud. Yes, you know, and I think there might be a future once we kind of like figure out what the size and shape of these models is where like the tiny box and these things come to fruition where like you can be GPU poor at home. But I think today is like, why are you working so hard to like get these models to run on like very small clusters where it's like, It's so cheap to run them.[00:42:21] Alessio: Yeah, yeah,[00:42:22] swyx: yeah. I think mostly people think it's cool. People think it's a stepping stone to scaling up. So they aspire to be GPU rich one day and they're working on new methods. Like news research, like probably the most deep tech thing they've done this year is Distro or whatever the new name is.[00:42:38] swyx: There's a lot of interest in heterogeneous computing, distributed computing. I tend generally to de emphasize that historically, but it may be coming to a time where it is starting to be relevant. I don't know. You know, SF compute launched their compute marketplace this year, and like, who's really using that?[00:42:53] swyx: Like, it's a bunch of small clusters, disparate types of compute, and if you can make that [00:43:00] useful, then that will be very beneficial to the broader community, but maybe still not the source of frontier models. It's just going to be a second tier of compute that is unlocked for people, and that's fine. But yeah, I mean, I think this year, I would say a lot more on device, We are, I now have Apple intelligence on my phone.[00:43:19] swyx: Doesn't do anything apart from summarize my notifications. But still, not bad. Like, it's multi modal.[00:43:25] Alessio: Yeah, the notification summaries are so and so in my experience.[00:43:29] swyx: Yeah, but they add, they add juice to life. And then, um, Chrome Nano, uh, Gemini Nano is coming out in Chrome. Uh, they're still feature flagged, but you can, you can try it now if you, if you use the, uh, the alpha.[00:43:40] swyx: And so, like, I, I think, like, you know, We're getting the sort of GPU poor version of a lot of these things coming out, and I think it's like quite useful. Like Windows as well, rolling out RWKB in sort of every Windows department is super cool. And I think the last thing that I never put in this GPU poor war, that I think I should now, [00:44:00] is the number of startups that are GPU poor but still scaling very well, as sort of wrappers on top of either a foundation model lab, or GPU Cloud.[00:44:10] swyx: GPU Cloud, it would be Suno. Suno, Ramp has rated as one of the top ranked, fastest growing startups of the year. Um, I think the last public number is like zero to 20 million this year in ARR and Suno runs on Moto. So Suno itself is not GPU rich, but they're just doing the training on, on Moto, uh, who we've also talked to on, on the podcast.[00:44:31] swyx: The other one would be Bolt, straight cloud wrapper. And, and, um, Again, another, now they've announced 20 million ARR, which is another step up from our 8 million that we put on the title. So yeah, I mean, it's crazy that all these GPU pores are finding a way while the GPU riches are also finding a way. And then the only failures, I kind of call this the GPU smiling curve, where the edges do well, because you're either close to the machines, and you're like [00:45:00] number one on the machines, or you're like close to the customers, and you're number one on the customer side.[00:45:03] swyx: And the people who are in the middle. Inflection, um, character, didn't do that great. I think character did the best of all of them. Like, you have a note in here that we apparently said that character's price tag was[00:45:15] Alessio: 1B.[00:45:15] swyx: Did I say that?[00:45:16] Alessio: Yeah. You said Google should just buy them for 1B. I thought it was a crazy number.[00:45:20] Alessio: Then they paid 2. 7 billion. I mean, for like,[00:45:22] swyx: yeah.[00:45:22] Alessio: What do you pay for node? Like, I don't know what the game world was like. Maybe the starting price was 1B. I mean, whatever it was, it worked out for everybody involved.[00:45:31] The Multi-Modality War[00:45:31] Alessio: Multimodality war. And this one, we never had text to video in the first version, which now is the hottest.[00:45:37] swyx: Yeah, I would say it's a subset of image, but yes.[00:45:40] Alessio: Yeah, well, but I think at the time it wasn't really something people were doing, and now we had VO2 just came out yesterday. Uh, Sora was released last month, last week. I've not tried Sora, because the day that I tried, it wasn't, yeah. I[00:45:54] swyx: think it's generally available now, you can go to Sora.[00:45:56] swyx: com and try it. Yeah, they had[00:45:58] Alessio: the outage. Which I [00:46:00] think also played a part into it. Small things. Yeah. What's the other model that you posted today that was on Replicate? Video or OneLive?[00:46:08] swyx: Yeah. Very, very nondescript name, but it is from Minimax, which I think is a Chinese lab. The Chinese labs do surprisingly well at the video models.[00:46:20] swyx: I'm not sure it's actually Chinese. I don't know. Hold me up to that. Yep. China. It's good. Yeah, the Chinese love video. What can I say? They have a lot of training data for video. Or a more relaxed regulatory environment.[00:46:37] Alessio: Uh, well, sure, in some way. Yeah, I don't think there's much else there. I think like, you know, on the image side, I think it's still open.[00:46:45] Alessio: Yeah, I mean,[00:46:46] swyx: 11labs is now a unicorn. So basically, what is multi modality war? Multi modality war is, do you specialize in a single modality, right? Or do you have GodModel that does all the modalities? So this is [00:47:00] definitely still going, in a sense of 11 labs, you know, now Unicorn, PicoLabs doing well, they launched Pico 2.[00:47:06] swyx: 0 recently, HeyGen, I think has reached 100 million ARR, Assembly, I don't know, but they have billboards all over the place, so I assume they're doing very, very well. So these are all specialist models, specialist models and specialist startups. And then there's the big labs who are doing the sort of all in one play.[00:47:24] swyx: And then here I would highlight Gemini 2 for having native image output. Have you seen the demos? Um, yeah, it's, it's hard to keep up. Literally they launched this last week and a shout out to Paige Bailey, who came to the Latent Space event to demo on the day of launch. And she wasn't prepared. She was just like, I'm just going to show you.[00:47:43] swyx: So they have voice. They have, you know, obviously image input, and then they obviously can code gen and all that. But the new one that OpenAI and Meta both have but they haven't launched yet is image output. So you can literally, um, I think their demo video was that you put in an image of a [00:48:00] car, and you ask for minor modifications to that car.[00:48:02] swyx: They can generate you that modification exactly as you asked. So there's no need for the stable diffusion or comfy UI workflow of like mask here and then like infill there in paint there and all that, all that stuff. This is small model nonsense. Big model people are like, huh, we got you in as everything in the transformer.[00:48:21] swyx: This is the multimodality war, which is, do you, do you bet on the God model or do you string together a whole bunch of, uh, Small models like a, like a chump. Yeah,[00:48:29] Alessio: I don't know, man. Yeah, that would be interesting. I mean, obviously I use Midjourney for all of our thumbnails. Um, they've been doing a ton on the product, I would say.[00:48:38] Alessio: They launched a new Midjourney editor thing. They've been doing a ton. Because I think, yeah, the motto is kind of like, Maybe, you know, people say black forest, the black forest models are better than mid journey on a pixel by pixel basis. But I think when you put it, put it together, have you tried[00:48:53] swyx: the same problems on black forest?[00:48:55] Alessio: Yes. But the problem is just like, you know, on black forest, it generates one image. And then it's like, you got to [00:49:00] regenerate. You don't have all these like UI things. Like what I do, no, but it's like time issue, you know, it's like a mid[00:49:06] swyx: journey. Call the API four times.[00:49:08] Alessio: No, but then there's no like variate.[00:49:10] Alessio: Like the good thing about mid journey is like, you just go in there and you're cooking. There's a lot of stuff that just makes it really easy. And I think people underestimate that. Like, it's not really a skill issue, because I'm paying mid journey, so it's a Black Forest skill issue, because I'm not paying them, you know?[00:49:24] Alessio: Yeah,[00:49:25] swyx: so, okay, so, uh, this is a UX thing, right? Like, you, you, you understand that, at least, we think that Black Forest should be able to do all that stuff. I will also shout out, ReCraft has come out, uh, on top of the image arena that, uh, artificial analysis has done, has apparently, uh, Flux's place. Is this still true?[00:49:41] swyx: So, Artificial Analysis is now a company. I highlighted them I think in one of the early AI Newses of the year. And they have launched a whole bunch of arenas. So, they're trying to take on LM Arena, Anastasios and crew. And they have an image arena. Oh yeah, Recraft v3 is now beating Flux 1. 1. Which is very surprising [00:50:00] because Flux And Black Forest Labs are the old stable diffusion crew who left stability after, um, the management issues.[00:50:06] swyx: So Recurve has come from nowhere to be the top image model. Uh, very, very strange. I would also highlight that Grok has now launched Aurora, which is, it's very interesting dynamics between Grok and Black Forest Labs because Grok's images were originally launched, uh, in partnership with Black Forest Labs as a, as a thin wrapper.[00:50:24] swyx: And then Grok was like, no, we'll make our own. And so they've made their own. I don't know, there are no APIs or benchmarks about it. They just announced it. So yeah, that's the multi modality war. I would say that so far, the small model, the dedicated model people are winning, because they are just focused on their tasks.[00:50:42] swyx: But the big model, People are always catching up. And the moment I saw the Gemini 2 demo of image editing, where I can put in an image and just request it and it does, that's how AI should work. Not like a whole bunch of complicated steps. So it really is something. And I think one frontier that we haven't [00:51:00] seen this year, like obviously video has done very well, and it will continue to grow.[00:51:03] swyx: You know, we only have Sora Turbo today, but at some point we'll get full Sora. Oh, at least the Hollywood Labs will get Fulsora. We haven't seen video to audio, or video synced to audio. And so the researchers that I talked to are already starting to talk about that as the next frontier. But there's still maybe like five more years of video left to actually be Soda.[00:51:23] swyx: I would say that Gemini's approach Compared to OpenAI, Gemini seems, or DeepMind's approach to video seems a lot more fully fledged than OpenAI. Because if you look at the ICML recap that I published that so far nobody has listened to, um, that people have listened to it. It's just a different, definitely different audience.[00:51:43] swyx: It's only seven hours long. Why are people not listening? It's like everything in Uh, so, so DeepMind has, is working on Genie. They also launched Genie 2 and VideoPoet. So, like, they have maybe four years advantage on world modeling that OpenAI does not have. Because OpenAI basically only started [00:52:00] Diffusion Transformers last year, you know, when they hired, uh, Bill Peebles.[00:52:03] swyx: So, DeepMind has, has a bit of advantage here, I would say, in, in, in showing, like, the reason that VO2, while one, They cherry pick their videos. So obviously it looks better than Sora, but the reason I would believe that VO2, uh, when it's fully launched will do very well is because they have all this background work in video that they've done for years.[00:52:22] swyx: Like, like last year's NeurIPS, I already was interviewing some of their video people. I forget their model name, but for, for people who are dedicated fans, they can go to NeurIPS 2023 and see, see that paper.[00:52:32] Alessio: And then last but not least, the LLMOS. We renamed it to Ragops, formerly known as[00:52:39] swyx: Ragops War. I put the latest chart on the Braintrust episode.[00:52:43] swyx: I think I'm going to separate these essays from the episode notes. So the reason I used to do that, by the way, is because I wanted to show up on Hacker News. I wanted the podcast to show up on Hacker News. So I always put an essay inside of there because Hacker News people like to read and not listen.[00:52:58] Alessio: So episode essays,[00:52:59] swyx: I remember [00:53:00] purchasing them separately. You say Lanchain Llama Index is still growing.[00:53:03] Alessio: Yeah, so I looked at the PyPy stats, you know. I don't care about stars. On PyPy you see Do you want to share your screen? Yes. I prefer to look at actual downloads, not at stars on GitHub. So if you look at, you know, Lanchain still growing.[00:53:20] Alessio: These are the last six months. Llama Index still growing. What I've basically seen is like things that, One, obviously these things have A commercial product. So there's like people buying this and sticking with it versus kind of hopping in between things versus, you know, for example, crew AI, not really growing as much.[00:53:38] Alessio: The stars are growing. If you look on GitHub, like the stars are growing, but kind of like the usage is kind of like flat. In the last six months, have they done some[00:53:4

god ceo new york spotify amazon time world ai europe google china apple vision pr voice future speaking san francisco new york times phd video thinking chinese simple data predictions elon musk impact surprise iphone chatgpt code legal tesla reflecting memory ga reddit discord busy cloud lgbt flash stem honestly ab pros jeff bezos windows excited researchers ip lower unicorns tackling sort survey insane tier cto whispers applications vc f1 gemini openai doc seal signing fireworks genie academic sf nvidia organizing ux api davos assembly frontier chrome gpt makes scarlett johansson ui aws turbo mm soda bash mosaic ml lama dropbox github creative writing drafting reinvent canvas 1b apis bolt ruler lava exact stripe pico hundred dev flux strawberry vm vcs sander wwdc 200k llm bt arr sora taiwanese opus moto sam altman gartner anthropic google docs assumption nemo blackwell parting google drive gpu grok sombra agi ramp opa tbd 3b 5b elia elo perplexity estimates gnome midjourney bytedance leopold rag ciso gpus dota haiku dx coursera sarah silverman sonnets deepmind ilya cypher getty george rr martin quill sdks cobalt future trends noam v2 alessio ttc sheesh lms satya r1 xai 8b suno ssi mcp veo stack overflow vo2 emerging trends rl mistral itc theoretically gpts replicate sota jensen huang yi black forest inflection databricks aitor graphql brain trust ai models chinchillas adept nosql grand central grand central station hacker news hacken zep ethical issues ai news cosign claud gpc tpu distro lubna heygen o3 neo4j cerebras autogpt gbt minimax gpd o1 jeremy howard 70b quent langchain 400b gradients exa neurips loras jeff dean 128k gemini pro elos code interpreter ai winter john franco icml lstm r1s aws reinvent muser latent space pypy nova pro dan gross paige bailey noam brown quiet capital john frankel
Wildcat Country
Territorial Cup preview with Barrett Baker and Jeff Dean

Wildcat Country

Play Episode Listen Later Nov 26, 2024 65:48


Arizona football hasn't beaten ASU in three straight years since the 1990s -- and the Wildcats are big underdogs to accomplish that feat this weekend. What will it take to pull the upset? Former Arizona special teams captain Barrett Baker and Wildcat PA announcer Jeff Dean join us for a Territorial Cup roundtable chat. Plus, reaction to Arizona basketball's disappointing performance vs. Duke, and predictions for the Battle 4 Atlantis tournament.

DeepMind: The Podcast
Decoding Google Gemini with Jeff Dean

DeepMind: The Podcast

Play Episode Listen Later Sep 11, 2024 53:02


Professor Hannah Fry is joined by Jeff Dean, one of the most legendary figures in computer science and chief scientist of Google DeepMind and Google Research. Jeff was instrumental to the field in the late 1990s, writing the code that transformed Google from a small startup into the multinational company it is today. Hannah and Jeff discuss it all - from the early days of Google and neural networks, to the long term potential of multi-modal models like Gemini.Thanks to everyone who made this possible, including but not limited to: Presenter: Professor Hannah FrySeries Producer: Dan HardoonEditor: Rami Tzabar, TellTale Studios Commissioner & Producer: Emma YousifMusic composition: Eleni Shaw Camera Director and Video Editor: Tommy BruceAudio Engineer: Perry RogantinVideo Studio Production: Nicholas DukeVideo Editor: Bilal MerhiVideo Production Design: James BartonVisual Identity and Design: Eleanor TomlinsonCommissioned by Google DeepMind Want to share feedback? Why not leave a review on your favorite streaming platform? Have a suggestion for a guest that we should have on next? Leave us a comment on YouTube and stay tuned for future episodes.

KFFN Spears & Ali
S&A Talk Arizona Football with Glenn Kinley, Jeff Dean, and Iron Chef with Anthony Dromgoole

KFFN Spears & Ali

Play Episode Listen Later Sep 11, 2024 49:16


Wildcat Country
New Mexico recap and NAU preview with Jeff Dean

Wildcat Country

Play Episode Listen Later Sep 3, 2024 63:22


T-Mac: Brilliant. The defense: Not so much. There's plenty to digest from Arizona's season-opening win over New Mexico. Arizona PA announcer Jeff Dean joins us to recap the game and explain why we shouldn't panic about the defensive letdowns on Saturday, and where T-Mac's performance ranks on the list of the best in Arizona history. Plus, our predictions for some of biggest games of Week 2, including Arizona vs. Northern Arizona.

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Chips, Models or Applications; Where is the Value in AI | Is Compute the Answer to All Model Performance Questions | Why Open AI Shelved AGI & Is There Any Value in Models with OpenAI Price Dumping with Aidan, Gomez, Co-Founder @ Cohere

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Aug 19, 2024 58:35


Aidan Gomez is the Co-founder & CEO at Cohere, the leading AI platform for enterprise, having raised over $1BN from some of the best with their last round pricing the company at a whopping $5.5BN. Prior to Cohere, Aidan co-authored the paper “Attention is All You Need,” which introduced the groundbreaking Transformer architecture. He also collaborated with a number of AI luminaries, including Geoffrey Hinton and Jeff Dean, during his time at Google Brain, where the team focused their efforts on large-scale machine learning. In Today's Episode with Aidan Gomez We Discuss: 1. Compute vs Data: What is the Bottleneck: Does Aidan believe that more compute will result in an equal increase in performance?   How much longer do we have before it becomes a case of diminishing returns?   What does Aidan mean when he says "he has changed his mind massively on the role of data"? What did he believe? How has it changed? 2. The Value of the Model: Given the demand for chips, the consumer need for applications, how does Aidan think about the inherent value of models today? Will any value accrue at the model layer? How does Aidan analyze the price dumping that OpenAI are doing? Is it a race to the bottom on price? Why does Aidan believe that "there is no value in last year's model"? Given all of this, is it possible to be an independent model provider without being owned by an incumbent who has a cloud business that acts as a cash cow for the model business? 3. Enterprise AI: It is Changing So Fast: What are the biggest concerns for the world's largest enterprises on adopting AI? Are we still in the experimental budget phase for enterprises? What is causing them to move from experimental budget to core budget today? Are we going to see a mass transition back from Cloud to On Prem with the largest enterprises not willing to let independent companies train with their data in the cloud? What does AI not do today that will be a gamechanger for the enterprise in 3-5 years? 4. The Wider World: Remote Work, Downfall of Europe and Relationships: Given humans spending more and more time talking to models, how does Aidan reflect on the idea of his children spending more time with models than people? Does he want that world? Why does Aidan believe that Europe is challenged immensely? How does the UK differ to Europe? Why does Aidan believe that remote work is just not nearly as productive as in person?  

Good Noise Podcast
Jeff Dean from Her Head's On Fire Interview | Talking about Strange Desires

Good Noise Podcast

Play Episode Listen Later Aug 7, 2024 31:27


We were very fortunate to have Jeff Dean from Her Head's On Fire on the podcast to talk about their new album, "Strange Desires". Enjoy! Her Head's On Fire Socials: Instagram: https://www.instagram.com/her_heads_on_fire/ Facebook: https://www.facebook.com/herheadsonfire Apple Music: https://music.apple.com/us/artist/her-heads-on-fire/1546313689 Spotify: https://open.spotify.com/artist/2gInA9NP2Mh7uBMrEeVnJY Bandcamp: https://herheadsonfire.bandcamp.com/ Grab some GNP Merch!: https://goodnoisepodcast.creator-spring.com/ Check out the recording gear we use: https://www.amazon.com/shop/goodnoisepodcast Support the show on Patreon: https://www.patreon.com/goodnoisepodcast Good Noise Podcast Socials: Twitter: https://twitter.com/good_noise_cast Instagram: https://www.instagram.com/goodnoisepodcast/ Facebook: https://www.facebook.com/goodnoisepod Discord: https://discord.gg/nDAQKwT YouTube: https://www.youtube.com/channel/UCFHKPdUxxe1MaGNWoFtjoJA Spotify: https://open.spotify.com/show/04IMtdIrCIvbIr7g6ttZHi All other streaming platforms: https://linktr.ee/goodnoisepodcast Bandcamp: https://goodnoiserecords.bandcamp.com/

No Wristbands! We Drink For Free
John Mohr & Mike Greenlees from Deep Tunnel Project, Tar & Blatant Dissent

No Wristbands! We Drink For Free

Play Episode Listen Later Aug 1, 2024 78:03


John Mohr and Mike Greenlees have been playing music together for a long time. They first got together in Dekalb, Illinois at Northern Illinois University in 1983 with the band Blatant Dissent. In 1988, they transitioned to the post-hardcore band Tar, based in Chicago. And now, in 2024, they've released a fantastic new album with their latest quartet, Deep Tunnel Project, with Jeff Dean and Tim Midyett. We talked with John and Mike about how this all happened - and shared our mutual love of Chicago!Time stamps:3: How did you meet?8: how would you describe the career of Blatant Dissent? 9:10: was there a conversation where you decided to keep making music full time?12: how did you end up on Touch & Go?17:30 how did the decision to end Tar come about?18:30: how did you start working with Steve Albini?25:22: how did it go from Tar to Deep Tunnel Project?31: at what point do you bring in other people to play with you?33: why was it important to you to have Chicago musicians?42: was it the intention to always put out an album?44: is there a song on the album that you feel is the most representative of the band?47:45: favorite song to play from the album?50: how did you end up on Comedy Minus One?51:30: what's the response been like to the record?55: what are the next steps for Deep Tunnel Project?62:40: deep dish or thin crust pizza?64: favorite venue to play at?67:40 what makes Chicago such a unique music scene?72:38: go to cheap drink?74:40: anything to plug on the way out?

Training Data
Factory's Matan Grinberg and Eno Reyes Unleash the Droids on Software Development

Training Data

Play Episode Listen Later Jun 25, 2024 59:10


Archimedes said that with a large enough lever, you can move the world. For decades, software engineering has been that lever. And now, AI is compounding that lever. How will we use AI to apply 100 or 1000x leverage to the greatest lever to move the world? Matan Grinberg and Eno Reyes, co-founders of Factory, have chosen to do things differently than many of their peers in this white-hot space. They sell a fleet of “Droids,” purpose-built dev agents which accomplish different tasks in the software development lifecycle (like code review, testing, pull requests or writing code). Rather than training their own foundation model, their approach is to build something useful for engineering orgs today on top of the rapidly improving models, aligning with the developer and evolving with them.  Matan and Eno are optimistic about the effects of autonomy in software development and on building a company in the application layer. Their advice to founders, “The only way you can win is by executing faster and being more obsessed.” Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned:  Juan Maldacena, Institute for Advanced Study, string theorist that Matan cold called as an undergrad SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, small-model open-source software engineering agent SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, an evaluation framework for GitHub issues Monte Carlo tree search, a 2006 algorithm for solving decision making in games (and used in AlphaGo) Language agent tree search, a framework for LLM planning, acting and reasoning The Bitter Lesson, Rich Sutton's essay on scaling in search and learning  Code churn, time to merge, cycle time, metrics Factory thinks are important to eng orgs Transcript: https://www.sequoiacap.com/podcast/training-data-factory/ 00:00 Introduction 01:36 Personal backgrounds 10:54 The compound lever 12:41 What is Factory?  16:29 Cognitive architectures  21:13 800 engineers at OpenAI are working on my margins  24:00 Jeff Dean doesn't understand your code base 25:40 Individual dev productivity vs system-wide optimization  30:04 Results: Factory in action  32:54 Learnings along the way  35:36 Fully autonomous Jeff Deans 37:56 Beacons of the upcoming age 40:04 How far are we?  43:02 Competition  45:32 Lightning round 49:34 Bonus round: Factory's SWE-bench results

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 6, 2024 121:17


Our next 2 big events are AI UX and the World's Fair. Join and apply to speak/sponsor!Due to timing issues we didn't have an interview episode to share with you this week, but not to worry, we have more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI. Enjoy!AI BreakdownThe indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape:and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta's AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents:Thursday Nights in AIWe're also including swyx's interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A:Dylan Patel on GroqWe hosted a private event with Dylan Patel of SemiAnalysis (our last pod here):Not all of it could be released so we just talked about our Groq estimates:Milind Naphade - Capital OneIn relation to conversations at NeurIPS and Nvidia GTC and upcoming at World's Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered:* Milind's learnings from ~25 years in machine learning * His first paper citation was 24 years ago* Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis * Thoughts on relevant AI research* GTC takeaways and what makes NVIDIA specialIf you'd like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team.Personal AI MeetupIt all started with a meme:Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world's first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within.Timestamps* [00:01:13] AI Breakdown Part 1* [00:02:20] Four Wars* [00:13:45] Sora* [00:15:12] Suno* [00:16:34] The GPT-4 Class Landscape* [00:17:03] Data War: Reddit x Google* [00:21:53] Gemini 1.5 vs Claude 3* [00:26:58] AI Breakdown Part 2* [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4* [00:31:11] Open Source Models - Mistral, Grok* [00:34:13] Apple MM1* [00:37:33] Meta's $800b AI rebrand* [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents* [00:47:28] Adept episode - Screen Multimodality* [00:48:54] Top Model Research from January Recap* [00:53:08] AI Wearables* [00:57:26] Groq vs Nvidia month - GPU Chip War* [01:00:31] Disagreements* [01:02:08] Summer 2024 Predictions* [01:04:18] Thursday Nights in AI - swyx* [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show* [01:34:58] GroqTranscript[00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find.[00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in the Latentspace About page.[00:00:56] swyx: NLW of the AI Breakdown asked us back to do a special on the 4Wars framework and the AI engineer scene. We love AI Breakdown as one of the best examples Daily podcasts to keep up on AI news, so we were especially excited to be back on Watch out and take[00:01:12] NLW: care[00:01:13] AI Breakdown Part 1[00:01:13] NLW: today on the AI breakdown. Part one of my conversation with Alessio and Swix from Latent Space.[00:01:19] NLW: All right, fellas, welcome back to the AI Breakdown. How are you doing? I'm good. Very good. With the last, the last time we did this show, we were like, oh yeah, let's do check ins like monthly about all the things that are going on and then. Of course, six months later, and, you know, the, the, the world has changed in a thousand ways.[00:01:36] NLW: It's just, it's too busy to even, to even think about podcasting sometimes. But I, I'm super excited to, to be chatting with you again. I think there's, there's a lot to, to catch up on, just to tap in, I think in the, you know, in the beginning of 2024. And, and so, you know, we're gonna talk today about just kind of a, a, a broad sense of where things are in some of the key battles in the AI space.[00:01:55] NLW: And then the, you know, one of the big things that I, that I'm really excited to have you guys on here for us to talk about where, sort of what patterns you're seeing and what people are actually trying to build, you know, where, where developers are spending their, their time and energy and, and, and any sort of, you know, trend trends there, but maybe let's start I guess by checking in on a framework that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is this sort of four wars of the, of the AI stack.[00:02:20] Four Wars[00:02:20] NLW: Because first, since I have you here, I'd love, I'd love to hear sort of like where that started gelling. And then and then maybe we can get into, I think a couple of them that are you know, particularly interesting, you know, in the, in light of[00:02:30] swyx: some recent news. Yeah, so maybe I'll take this one. So the four wars is a framework that I came up around trying to recap all of 2023.[00:02:38] swyx: I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes one piece of news last longer than another or more significant than another. And I think it's basically always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know, the most limited resource is talent, but the talent expresses itself in a number of areas.[00:03:01] swyx: And so I kind of focus on those, those areas at first. So the four wars that we cover are the data wars, the GPU rich, poor war, the multi modal war, And the RAG and Ops War. And I think you actually did a dedicated episode to that, so thanks for covering that. Yeah, yeah.[00:03:18] NLW: Not only did I do a dedicated episode, I actually used that.[00:03:22] NLW: I can't remember if I told you guys. I did give you big shoutouts. But I used it as a framework for a presentation at Intel's big AI event that they hold each year, where they have all their folks who are working on AI internally. And it totally resonated. That's amazing. Yeah, so, so, what got me thinking about it again is specifically this inflection news that we recently had, this sort of, you know, basically, I can't imagine that anyone who's listening wouldn't have thought about it, but, you know, inflection is a one of the big contenders, right?[00:03:53] NLW: I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs, but it's a company that raised 1. 3 billion last year, less than a year ago. Reed Hoffman's a co founder Mustafa Suleyman, who's a co founder of DeepMind, you know, so it's like, this is not a a small startup, let's say, at least in terms of perception.[00:04:13] NLW: And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they're bringing in a new CEO. And you know, I'm interested in, in, in kind of your take on how much that reflects, like hold aside, I guess, you know, all the other things that it might be about, how much it reflects this sort of the, the stark.[00:04:32] NLW: Brutal reality of competing in the frontier model space right now. And, you know, just the access to compute.[00:04:38] Alessio: There are a lot of things to say. So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU rich by startup standard. I think about 22, 000 H100s, but obviously that pales compared to the, to Microsoft.[00:04:55] Alessio: The other thing is that this is probably good news, maybe for the startups. It's like being GPU rich, it's not enough. You know, like I think they were building something pretty interesting in, in pi of their own model of their own kind of experience. But at the end of the day, you're the interface that people consume as end users.[00:05:13] Alessio: It's really similar to a lot of the others. So and we'll tell, talk about GPT four and cloud tree and all this stuff. GPU poor, doing something. That the GPU rich are not interested in, you know we just had our AI center of excellence at Decibel and one of the AI leads at one of the big companies was like, Oh, we just saved 10 million and we use these models to do a translation, you know, and that's it.[00:05:39] Alessio: It's not, it's not a GI, it's just translation. So I think like the inflection part is maybe. A calling and a waking to a lot of startups then say, Hey, you know, trying to get as much capital as possible, try and get as many GPUs as possible. Good. But at the end of the day, it doesn't build a business, you know, and maybe what inflection I don't, I don't, again, I don't know the reasons behind the inflection choice, but if you say, I don't want to build my own company that has 1.[00:06:05] Alessio: 3 billion and I want to go do it at Microsoft, it's probably not a resources problem. It's more of strategic decisions that you're making as a company. So yeah, that was kind of my. I take on it.[00:06:15] swyx: Yeah, and I guess on my end, two things actually happened yesterday. It was a little bit quieter news, but Stability AI had some pretty major departures as well.[00:06:25] swyx: And you may not be considering it, but Stability is actually also a GPU rich company in the sense that they were the first new startup in this AI wave to brag about how many GPUs that they have. And you should join them. And you know, Imadis is definitely a GPU trader in some sense from his hedge fund days.[00:06:43] swyx: So Robin Rhombach and like the most of the Stable Diffusion 3 people left Stability yesterday as well. So yesterday was kind of like a big news day for the GPU rich companies, both Inflection and Stability having sort of wind taken out of their sails. I think, yes, it's a data point in the favor of Like, just because you have the GPUs doesn't mean you can, you automatically win.[00:07:03] swyx: And I think, you know, kind of I'll echo what Alessio says there. But in general also, like, I wonder if this is like the start of a major consolidation wave, just in terms of, you know, I think that there was a lot of funding last year and, you know, the business models have not been, you know, All of these things worked out very well.[00:07:19] swyx: Even inflection couldn't do it. And so I think maybe that's the start of a small consolidation wave. I don't think that's like a sign of AI winter. I keep looking for AI winter coming. I think this is kind of like a brief cold front. Yeah,[00:07:34] NLW: it's super interesting. So I think a bunch of A bunch of stuff here.[00:07:38] NLW: One is, I think, to both of your points, there, in some ways, there, there had already been this very clear demarcation between these two sides where, like, the GPU pores, to use the terminology, like, just weren't trying to compete on the same level, right? You know, the vast majority of people who have started something over the last year, year and a half, call it, were racing in a different direction.[00:07:59] NLW: They're trying to find some edge somewhere else. They're trying to build something different. If they're, if they're really trying to innovate, it's in different areas. And so it's really just this very small handful of companies that are in this like very, you know, it's like the coheres and jaspers of the world that like this sort of, you know, that are that are just sort of a little bit less resourced than, you know, than the other set that I think that this potentially even applies to, you know, everyone else that could clearly demarcate it into these two, two sides.[00:08:26] NLW: And there's only a small handful kind of sitting uncomfortably in the middle, perhaps. Let's, let's come back to the idea of, of the sort of AI winter or, you know, a cold front or anything like that. So this is something that I, I spent a lot of time kind of thinking about and noticing. And my perception is that The vast majority of the folks who are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the phase to that are people who either, A, just don't like AI for some other reason there's plenty of that, you know, people who are saying, You Look, they're doing way worse than they ever thought.[00:09:03] NLW: You know, there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a different narrative, right? Because they're sort of sick of, you know, telling the same story. Same thing happened last summer, when every every outlet jumped on the chat GPT at its first down month story to try to really like kind of hammer this idea that that the hype was too much.[00:09:24] NLW: Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swyx, about that and the consolidation period.[00:09:42] NLW: Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot. platforms that, you know, a lot of which have, you know, raised funding. There's no universe in which all of those are successful across, you know, even with a, even, even with a total addressable market of every enterprise in the world, you know, you're just inevitably going to see some amount of consolidation.[00:10:08] NLW: Same with, you know, image generators. There are, if you look at A16Z's top 50 consumer AI apps, just based on, you know, web traffic or whatever, they're still like I don't know, a half. Dozen or 10 or something, like, some ridiculous number of like, basically things like Midjourney or Dolly three. And it just seems impossible that we're gonna have that many, you know, ultimately as, as, as sort of, you know, going, going concerned.[00:10:33] NLW: So, I don't know. I, I, I think that the, there will be inevitable consolidation 'cause you know. It's, it's also what kind of like venture rounds are supposed to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone who gets a series A is supposed to get to series B.[00:10:46] NLW: That's sort of the natural process. I think it will be tempting for a lot of people to try to infer from that something about AI not being as sort of big or as as sort of relevant as, as it was hyped up to be. But I, I kind of think that's the wrong conclusion to come to.[00:11:02] Alessio: I I would say the experimentation.[00:11:04] Alessio: Surface is a little smaller for image generation. So if you go back maybe six, nine months, most people will tell you, why would you build a coding assistant when like Copilot and GitHub are just going to win everything because they have the data and they have all the stuff. If you fast forward today, A lot of people use Cursor everybody was excited about the Devin release on Twitter.[00:11:26] Alessio: There are a lot of different ways of attacking the market that are not completion of code in the IDE. And even Cursors, like they evolved beyond single line to like chat, to do multi line edits and, and all that stuff. Image generation, I would say, yeah, as a, just as from what I've seen, like maybe the product innovation has slowed down at the UX level and people are improving the models.[00:11:50] Alessio: So the race is like, how do I make better images? It's not like, how do I make the user interact with the generation process better? And that gets tough, you know? It's hard to like really differentiate yourselves. So yeah, that's kind of how I look at it. And when we think about multimodality, maybe the reason why people got so excited about Sora is like, oh, this is like a completely It's not a better image model.[00:12:13] Alessio: This is like a completely different thing, you know? And I think the creative mind It's always looking for something that impacts the viewer in a different way, you know, like they really want something different versus the developer mind. It's like, Oh, I, I just, I have this like very annoying thing I want better.[00:12:32] Alessio: I have this like very specific use cases that I want to go after. So it's just different. And that's why you see a lot more companies in image generation. But I agree with you that. If you fast forward there, there's not going to be 10 of them, you know, it's probably going to be one or[00:12:46] swyx: two. Yeah, I mean, to me, that's why I call it a war.[00:12:49] swyx: Like, individually, all these companies can make a story that kind of makes sense, but collectively, they cannot all be true. Therefore, they all, there is some kind of fight over limited resources here. Yeah, so[00:12:59] NLW: it's interesting. We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is, you know, basically a question of whether it's going to be these sort of big everything models that end up winning or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI's larger models versus, you know, a mid journey or something like that.[00:13:24] NLW: And at first, you know, I was kind of thinking like, For most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're, you're seeing just like great innovation on sort of the everything models, but you're also seeing lots and lots happen at sort of the level of kind of individual use cases.[00:13:45] Sora[00:13:45] NLW: But then Sora comes along and just like obliterates what I think anyone thought you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?[00:13:59] swyx: Yeah, this was definitely a both and story, and Sora tipped things one way for me, in terms of scale being all you need.[00:14:08] swyx: And the benefit, I think, of having multiple models being developed under one roof. I think a lot of people aren't aware that Sora was developed in a similar fashion to Dolly 3. And Dolly3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT 4 vision and GPT 4.[00:14:31] swyx: And, and it was just all, like, really interesting, like, if you work on one modality, it enables you to work on other modalities, and all that is more, is, is more interesting. I think it's beneficial if it's all in the same house, whereas the individual startups who don't, who sort of carve out a single modality and work on that, definitely won't have the state of the art stuff on helping them out on synthetic data.[00:14:52] swyx: So I do think like, The balance is tilted a little bit towards the God model companies, which is challenging for the, for the, for the the sort of dedicated modality companies. But everyone's carving out different niches. You know, like we just interviewed Suno ai, the sort of music model company, and, you know, I don't see opening AI pursuing music anytime soon.[00:15:12] Suno[00:15:12] swyx: Yeah,[00:15:13] NLW: Suno's been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don't consider themselves particularly interested in doing the thing that the AI enables find themselves doing a lot more of that thing, right?[00:15:29] NLW: Like, it'd be one thing if Just musicians were excited about Suno and using it but what you're seeing is tons of people who just like music all of a sudden like playing around with it and finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the[00:15:45] swyx: early days of it.[00:15:46] swyx: Yeah, I, you know, I, I asked them directly, you know, in the interview about whether they consider themselves mid journey for music. And he had a more sort of nuanced response there, but I think that probably the business model is going to be very similar because he's focused on the B2C element of that. So yeah, I mean, you know, just to, just to tie back to the question about, you know, You know, large multi modality companies versus small dedicated modality companies.[00:16:10] swyx: Yeah, highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they, they strongly correlated themselves with the same synthetic data bootstrapping methods as Dali. And I think once you make those connections, you're like, oh, like it, it, it is beneficial to have multiple state of the art models in house that all help each other.[00:16:28] swyx: And these, this, that's the one thing that a dedicated modality company cannot do.[00:16:34] The GPT-4 Class Landscape[00:16:34] NLW: So I, I wanna jump, I wanna kind of build off that and, and move into the sort of like updated GPT-4 class landscape. 'cause that's obviously been another big change over the last couple months. But for the sake of completeness, is there anything that's worth touching on with with sort of the quality?[00:16:46] NLW: Quality data or sort of a rag ops wars just in terms of, you know, anything that's changed, I guess, for you fundamentally in the last couple of months about where those things stand.[00:16:55] swyx: So I think we're going to talk about rag for the Gemini and Clouds discussion later. And so maybe briefly discuss the data piece.[00:17:03] Data War: Reddit x Google[00:17:03] swyx: I think maybe the only new thing was this Reddit deal with Google for like a 60 million dollar deal just ahead of their IPO, very conveniently turning Reddit into a AI data company. Also, very, very interestingly, a non exclusive deal, meaning that Reddit can resell that data to someone else. And it probably does become table stakes.[00:17:23] swyx: A lot of people don't know, but a lot of the web text dataset that originally started for GPT 1, 2, and 3 was actually scraped from GitHub. from Reddit at least the sort of vote scores. And I think, I think that's a, that's a very valuable piece of information. So like, yeah, I think people are figuring out how to pay for data.[00:17:40] swyx: People are suing each other over data. This, this, this war is, you know, definitely very, very much heating up. And I don't think, I don't see it getting any less intense. I, you know, next to GPUs, data is going to be the most expensive thing in, in a model stack company. And. You know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher based on how far along or how commercially blessed the, the forms of creating that synthetic data are.[00:18:11] swyx: I don't know if Alessio, you have any other interactions with like Data source companies, but that's my two cents.[00:18:17] Alessio: Yeah yeah, I actually saw Quentin Anthony from Luther. ai at GTC this week. He's also been working on this. I saw Technium. He's also been working on the data side. I think especially in open source, people are like, okay, if everybody is putting the gates up, so to speak, to the data we need to make it easier for people that don't have 50 million a year to get access to good data sets.[00:18:38] Alessio: And Jensen, at his keynote, he did talk about synthetic data a little bit. So I think that's something that we'll definitely hear more and more of in the enterprise, which never bodes well, because then all the, all the people with the data are like, Oh, the enterprises want to pay now? Let me, let me put a pay here stripe link so that they can give me 50 million.[00:18:57] Alessio: But it worked for Reddit. I think the stock is up. 40 percent today after opening. So yeah, I don't know if it's all about the Google deal, but it's obviously Reddit has been one of those companies where, hey, you got all this like great community, but like, how are you going to make money? And like, they try to sell the avatars.[00:19:15] Alessio: I don't know if that it's a great business for them. The, the data part sounds as an investor, you know, the data part sounds a lot more interesting than, than consumer[00:19:25] swyx: cosmetics. Yeah, so I think, you know there's more questions around data you know, I think a lot of people are talking about the interview that Mira Murady did with the Wall Street Journal, where she, like, just basically had no, had no good answer for where they got the data for Sora.[00:19:39] swyx: I, I think this is where, you know, there's, it's in nobody's interest to be transparent about data, and it's, it's kind of sad for the state of ML and the state of AI research but it is what it is. We, we have to figure this out as a society, just like we did for music and music sharing. You know, in, in sort of the Napster to Spotify transition, and that might take us a decade.[00:19:59] swyx: Yeah, I[00:20:00] NLW: do. I, I agree. I think, I think that you're right to identify it, not just as that sort of technical problem, but as one where society has to have a debate with itself. Because I think that there's, if you rationally within it, there's Great kind of points on all side, not to be the sort of, you know, person who sits in the middle constantly, but it's why I think a lot of these legal decisions are going to be really important because, you know, the job of judges is to listen to all this stuff and try to come to things and then have other judges disagree.[00:20:24] NLW: And, you know, and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the synthetic data right now is like eggs in the 80s and 90s. Like, whether they're good for you or bad for you, like, you know, we, we get one study that's like synthetic data, you know, there's model collapse.[00:20:42] NLW: And then we have like a hint that llama, you know, to the most high performance version of it, which was one they didn't release was trained on synthetic data. So maybe it's good. It's like, I just feel like every, every other week I'm seeing something sort of different about whether it's a good or bad for, for these models.[00:20:56] swyx: Yeah. The branding of this is pretty poor. I would kind of tell people to think about it like cholesterol. There's good cholesterol, bad cholesterol. And you can have, you know, good amounts of both. But at this point, it is absolutely without a doubt that most large models from here on out will all be trained as some kind of synthetic data and that is not a bad thing.[00:21:16] swyx: There are ways in which you can do it poorly. Whether it's commercial, you know, in terms of commercial sourcing or in terms of the model performance. But it's without a doubt that good synthetic data is going to help your model. And this is just a question of like where to obtain it and what kinds of synthetic data are valuable.[00:21:36] swyx: You know, if even like alpha geometry, you know, was, was a really good example from like earlier this year.[00:21:42] NLW: If you're using the cholesterol analogy, then my, then my egg thing can't be that far off. Let's talk about the sort of the state of the art and the, and the GPT 4 class landscape and how that's changed.[00:21:53] Gemini 1.5 vs Claude 3[00:21:53] NLW: Cause obviously, you know, sort of the, the two big things or a couple of the big things that have happened. Since we last talked, we're one, you know, Gemini first announcing that a model was coming and then finally it arriving, and then very soon after a sort of a different model arriving from Gemini and and Cloud three.[00:22:11] NLW: So I guess, you know, I'm not sure exactly where the right place to start with this conversation is, but, you know, maybe very broadly speaking which of these do you think have made a bigger impact? Thank you.[00:22:20] Alessio: Probably the one you can use, right? So, Cloud. Well, I'm sure Gemini is going to be great once they let me in, but so far I haven't been able to.[00:22:29] Alessio: I use, so I have this small podcaster thing that I built for our podcast, which does chapters creation, like named entity recognition, summarization, and all of that. Cloud Tree is, Better than GPT 4. Cloud2 was unusable. So I use GPT 4 for everything. And then when Opus came out, I tried them again side by side and I posted it on, on Twitter as well.[00:22:53] Alessio: Cloud is better. It's very good, you know, it's much better, it seems to me, it's much better than GPT 4 at doing writing that is more, you know, I don't know, it just got good vibes, you know, like the GPT 4 text, you can tell it's like GPT 4, you know, it's like, it always uses certain types of words and phrases and, you know, maybe it's just me because I've now done it for, you know, So, I've read like 75, 80 generations of these things next to each other.[00:23:21] Alessio: Clutter is really good. I know everybody is freaking out on twitter about it, my only experience of this is much better has been on the podcast use case. But I know that, you know, Quran from from News Research is a very big opus pro, pro opus person. So, I think that's also It's great to have people that actually care about other models.[00:23:40] Alessio: You know, I think so far to a lot of people, maybe Entropic has been the sibling in the corner, you know, it's like Cloud releases a new model and then OpenAI releases Sora and like, you know, there are like all these different things, but yeah, the new models are good. It's interesting.[00:23:55] NLW: My my perception is definitely that just, just observationally, Cloud 3 is certainly the first thing that I've seen where lots of people.[00:24:06] NLW: They're, no one's debating evals or anything like that. They're talking about the specific use cases that they have, that they used to use chat GPT for every day, you know, day in, day out, that they've now just switched over. And that has, I think, shifted a lot of the sort of like vibe and sentiment in the space too.[00:24:26] NLW: And I don't necessarily think that it's sort of a A like full you know, sort of full knock. Let's put it this way. I think it's less bad for open AI than it is good for anthropic. I think that because GPT 5 isn't there, people are not quite willing to sort of like, you know get overly critical of, of open AI, except in so far as they're wondering where GPT 5 is.[00:24:46] NLW: But I do think that it makes, Anthropic look way more credible as a, as a, as a player, as a, you know, as a credible sort of player, you know, as opposed to to, to where they were.[00:24:57] Alessio: Yeah. And I would say the benchmarks veil is probably getting lifted this year. I think last year. People were like, okay, this is better than this on this benchmark, blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.[00:25:11] Alessio: So it's hard to like compare yourself. So you, you defer to the benchmarks. I think now as we go into 2024, a lot of people have started to use these models from, you know, from very sophisticated things that they run in production to some utility that they have on their own. Now they can just run them side by side.[00:25:29] Alessio: And it's like, Hey, I don't care that like. The MMLU score of Opus is like slightly lower than GPT 4. It just works for me, you know, and I think that's the same way that traditional software has been used by people, right? Like you just strive for yourself and like, which one does it work, works best for you?[00:25:48] Alessio: Like nobody looks at benchmarks outside of like sales white papers, you know? And I think it's great that we're going more in that direction. We have a episode with Adapt coming out this weekend. I'll and some of their model releases, they specifically say, We do not care about benchmarks, so we didn't put them in, you know, because we, we don't want to look good on them.[00:26:06] Alessio: We just want the product to work. And I think more and more people will, will[00:26:09] swyx: go that way. Yeah. I I would say like, it does take the wind out of the sails for GPT 5, which I know where, you know, Curious about later on. I think anytime you put out a new state of the art model, you have to break through in some way.[00:26:21] swyx: And what Claude and Gemini have done is effectively take away any advantage to saying that you have a million token context window. Now everyone's just going to be like, Oh, okay. Now you just match the other two guys. And so that puts An insane amount of pressure on what gpt5 is going to be because it's just going to have like the only option it has now because all the other models are multimodal all the other models are long context all the other models have perfect recall gpt5 has to match everything and do more to to not be a flop[00:26:58] AI Breakdown Part 2[00:26:58] NLW: hello friends back again with part two if you haven't heard part one of this conversation i suggest you go check it out but to be honest they are kind of actually separable In this conversation, we get into a topic that I think Alessio and Swyx are very well positioned to discuss, which is what developers care about right now, what people are trying to build around.[00:27:16] NLW: I honestly think that one of the best ways to see the future in an industry like AI is to try to dig deep on what developers and entrepreneurs are attracted to build, even if it hasn't made it to the news pages yet. So consider this your preview of six months from now, and let's dive in. Let's bring it to the GPT 5 conversation.[00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4[00:27:33] NLW: I mean, so, so I think that that's a great sort of assessment of just how the stakes have been raised, you know is your, I mean, so I guess maybe, maybe I'll, I'll frame this less as a question, just sort of something that, that I, that I've been watching right now, the only thing that makes sense to me with how.[00:27:50] NLW: Fundamentally unbothered and unstressed OpenAI seems about everything is that they're sitting on something that does meet all that criteria, right? Because, I mean, even in the Lex Friedman interview that, that Altman recently did, you know, he's talking about other things coming out first. He's talking about, he's just like, he, listen, he, he's good and he could play nonchalant, you know, if he wanted to.[00:28:13] NLW: So I don't want to read too much into it, but. You know, they've had so long to work on this, like unless that we are like really meaningfully running up against some constraint, it just feels like, you know, there's going to be some massive increase, but I don't know. What do you guys think?[00:28:28] swyx: Hard to speculate.[00:28:29] swyx: You know, at this point, they're, they're pretty good at PR and they're not going to tell you anything that they don't want to. And he can tell you one thing and change their minds the next day. So it's, it's, it's really, you know, I've always said that model version numbers are just marketing exercises, like they have something and it's always improving and at some point you just cut it and decide to call it GPT 5.[00:28:50] swyx: And it's more just about defining an arbitrary level at which they're ready and it's up to them on what ready means. We definitely did see some leaks on GPT 4. 5, as I think a lot of people reported and I'm not sure if you covered it. So it seems like there might be an intermediate release. But I did feel, coming out of the Lex Friedman interview, that GPT 5 was nowhere near.[00:29:11] swyx: And you know, it was kind of a sharp contrast to Sam talking at Davos in February, saying that, you know, it was his top priority. So I find it hard to square. And honestly, like, there's also no point Reading too much tea leaves into what any one person says about something that hasn't happened yet or has a decision that hasn't been taken yet.[00:29:31] swyx: Yeah, that's, that's my 2 cents about it. Like, calm down, let's just build .[00:29:35] Alessio: Yeah. The, the February rumor was that they were gonna work on AI agents, so I don't know, maybe they're like, yeah,[00:29:41] swyx: they had two agent two, I think two agent projects, right? One desktop agent and one sort of more general yeah, sort of GPTs like agent and then Andre left, so he was supposed to be the guy on that.[00:29:52] swyx: What did Andre see? What did he see? I don't know. What did he see?[00:29:56] Alessio: I don't know. But again, it's just like the rumors are always floating around, you know but I think like, this is, you know, we're not going to get to the end of the year without Jupyter you know, that's definitely happening. I think the biggest question is like, are Anthropic and Google.[00:30:13] Alessio: Increasing the pace, you know, like it's the, it's the cloud four coming out like in 12 months, like nine months. What's the, what's the deal? Same with Gemini. They went from like one to 1. 5 in like five days or something. So when's Gemini 2 coming out, you know, is that going to be soon? I don't know.[00:30:31] Alessio: There, there are a lot of, speculations, but the good thing is that now you can see a world in which OpenAI doesn't rule everything. You know, so that, that's the best, that's the best news that everybody got, I would say.[00:30:43] swyx: Yeah, and Mistral Large also dropped in the last month. And, you know, not as, not quite GPT 4 class, but very good from a new startup.[00:30:52] swyx: So yeah, we, we have now slowly changed in landscape, you know. In my January recap, I was complaining that nothing's changed in the landscape for a long time. But now we do exist in a world, sort of a multipolar world where Cloud and Gemini are legitimate challengers to GPT 4 and hopefully more will emerge as well hopefully from meta.[00:31:11] Open Source Models - Mistral, Grok[00:31:11] NLW: So speak, let's actually talk about sort of the open source side of this for a minute. So Mistral Large, notable because it's, it's not available open source in the same way that other things are, although I think my perception is that the community has largely given them Like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.[00:31:27] NLW: And so they kind of understand that there's like, they got to figure out how to eat, but we've got, so, you know, there there's Mistral, there's, I guess, Grok now, which is, you know, Grok one is from, from October is, is open[00:31:38] swyx: sourced at, yeah. Yeah, sorry, I thought you thought you meant Grok the chip company.[00:31:41] swyx: No, no, no, yeah, you mean Twitter Grok.[00:31:43] NLW: Although Grok the chip company, I think is even more interesting in some ways, but and then there's the, you know, obviously Llama3 is the one that sort of everyone's wondering about too. And, you know, my, my sense of that, the little bit that, you know, Zuckerberg was talking about Llama 3 earlier this year, suggested that, at least from an ambition standpoint, he was not thinking about how do I make sure that, you know, meta content, you know, keeps, keeps the open source thrown, you know, vis a vis Mistral.[00:32:09] NLW: He was thinking about how you go after, you know, how, how he, you know, releases a thing that's, you know, every bit as good as whatever OpenAI is on at that point.[00:32:16] Alessio: Yeah. From what I heard in the hallways at, at GDC, Llama 3, the, the biggest model will be, you 260 to 300 billion parameters, so that that's quite large.[00:32:26] Alessio: That's not an open source model. You know, you cannot give people a 300 billion parameters model and ask them to run it. You know, it's very compute intensive. So I think it is, it[00:32:35] swyx: can be open source. It's just, it's going to be difficult to run, but that's a separate question.[00:32:39] Alessio: It's more like, as you think about what they're doing it for, you know, it's not like empowering the person running.[00:32:45] Alessio: llama. On, on their laptop, it's like, oh, you can actually now use this to go after open AI, to go after Anthropic, to go after some of these companies at like the middle complexity level, so to speak. Yeah. So obviously, you know, we estimate Gentala on the podcast, they're doing a lot here, they're making PyTorch better.[00:33:03] Alessio: You know, they want to, that's kind of like maybe a little bit of a shorted. Adam Bedia, in a way, trying to get some of the CUDA dominance out of it. Yeah, no, it's great. The, I love the duck destroying a lot of monopolies arc. You know, it's, it's been very entertaining. Let's bridge[00:33:18] NLW: into the sort of big tech side of this, because this is obviously like, so I think actually when I did my episode, this was one of the I added this as one of as an additional war that, that's something that I'm paying attention to.[00:33:29] NLW: So we've got Microsoft's moves with inflection, which I think pretend, potentially are being read as A shift vis a vis the relationship with OpenAI, which also the sort of Mistral large relationship seems to reinforce as well. We have Apple potentially entering the race, finally, you know, giving up Project Titan and and, and kind of trying to spend more effort on this.[00:33:50] NLW: Although, Counterpoint, we also have them talking about it, or there being reports of a deal with Google, which, you know, is interesting to sort of see what their strategy there is. And then, you know, Meta's been largely quiet. We kind of just talked about the main piece, but, you know, there's, and then there's spoilers like Elon.[00:34:07] NLW: I mean, you know, what, what of those things has sort of been most interesting to you guys as you think about what's going to shake out for the rest of this[00:34:13] Apple MM1[00:34:13] swyx: year? I'll take a crack. So the reason we don't have a fifth war for the Big Tech Wars is that's one of those things where I just feel like we don't cover differently from other media channels, I guess.[00:34:26] swyx: Sure, yeah. In our anti interestness, we actually say, like, we try not to cover the Big Tech Game of Thrones, or it's proxied through Twitter. You know, all the other four wars anyway, so there's just a lot of overlap. Yeah, I think absolutely, personally, the most interesting one is Apple entering the race.[00:34:41] swyx: They actually released, they announced their first large language model that they trained themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like the first time that Apple has kind of showcased that, yeah, we're training large models in house as well. Of course, like, they might be doing this deal with Google.[00:34:57] swyx: I don't know. It sounds very sort of rumor y to me. And it's probably, if it's on device, it's going to be a smaller model. So something like a Jemma. It's going to be smarter autocomplete. I don't know what to say. I'm still here dealing with, like, Siri, which hasn't, probably hasn't been updated since God knows when it was introduced.[00:35:16] swyx: It's horrible. I, you know, it, it, it makes me so angry. So I, I, one, as an Apple customer and user, I, I'm just hoping for better AI on Apple itself. But two, they are the gold standard when it comes to local devices, personal compute and, and trust, like you, you trust them with your data. And. I think that's what a lot of people are looking for in AI, that they have, they love the benefits of AI, they don't love the downsides, which is that you have to send all your data to some cloud somewhere.[00:35:45] swyx: And some of this data that we're going to feed AI is just the most personal data there is. So Apple being like one of the most trusted personal data companies, I think it's very important that they enter the AI race, and I hope to see more out of them.[00:35:58] Alessio: To me, the, the biggest question with the Google deal is like, who's paying who?[00:36:03] Alessio: Because for the browsers, Google pays Apple like 18, 20 billion every year to be the default browser. Is Google going to pay you to have Gemini or is Apple paying Google to have Gemini? I think that's, that's like what I'm most interested to figure out because with the browsers, it's like, it's the entry point to the thing.[00:36:21] Alessio: So it's really valuable to be the default. That's why Google pays. But I wonder if like the perception in AI is going to be like, Hey. You just have to have a good local model on my phone to be worth me purchasing your device. And that was, that's kind of drive Apple to be the one buying the model. But then, like Shawn said, they're doing the MM1 themselves.[00:36:40] Alessio: So are they saying we do models, but they're not as good as the Google ones? I don't know. The whole thing is, it's really confusing, but. It makes for great meme material on on Twitter.[00:36:51] swyx: Yeah, I mean, I think, like, they are possibly more than OpenAI and Microsoft and Amazon. They are the most full stack company there is in computing, and so, like, they own the chips, man.[00:37:05] swyx: Like, they manufacture everything so if, if, if there was a company that could do that. You know, seriously challenge the other AI players. It would be Apple. And it's, I don't think it's as hard as self driving. So like maybe they've, they've just been investing in the wrong thing this whole time. We'll see.[00:37:21] swyx: Wall Street certainly thinks[00:37:22] NLW: so. Wall Street loved that move, man. There's a big, a big sigh of relief. Well, let's, let's move away from, from sort of the big stuff. I mean, the, I think to both of your points, it's going to.[00:37:33] Meta's $800b AI rebrand[00:37:33] NLW: Can I, can[00:37:34] swyx: I, can I, can I jump on factoid about this, this Wall Street thing? I went and looked at when Meta went from being a VR company to an AI company.[00:37:44] swyx: And I think the stock I'm trying to look up the details now. The stock has gone up 187% since Lamo one. Yeah. Which is $830 billion in market value created in the past year. . Yeah. Yeah.[00:37:57] NLW: It's, it's, it's like, remember if you guys haven't Yeah. If you haven't seen the chart, it's actually like remarkable.[00:38:02] NLW: If you draw a little[00:38:03] swyx: arrow on it, it's like, no, we're an AI company now and forget the VR thing.[00:38:10] NLW: It's it, it is an interesting, no, it's, I, I think, alessio, you called it sort of like Zuck's Disruptor Arc or whatever. He, he really does. He is in the midst of a, of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he, he's sort of the spoiler.[00:38:25] NLW: Like people loved him just freestyle talking about why he thought they had a better headset than Apple. But even if they didn't agree, they just loved it. He was going direct to camera and talking about it for, you know, five minutes or whatever. So that, that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever, two years ago.[00:38:41] NLW: Yeah. Yeah,[00:38:42] swyx: we still[00:38:43] Alessio: didn't see and fight Elon though, so[00:38:45] swyx: that's what I'm really looking forward to. I mean, hey, don't, don't, don't write it off, you know, maybe just these things take a while to happen. But we need to see and fight in the Coliseum. No, I think you know, in terms of like self management, life leadership, I think he has, there's a lot of lessons to learn from him.[00:38:59] swyx: You know he might, you know, you might kind of quibble with, like, the social impact of Facebook, but just himself as a in terms of personal growth and, and, you know, Per perseverance through like a lot of change and you know, everyone throwing stuff his way. I think there's a lot to say about like, to learn from, from Zuck, which is crazy 'cause he's my age.[00:39:18] swyx: Yeah. Right.[00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents[00:39:20] NLW: Awesome. Well, so, so one of the big things that I think you guys have, you know, distinct and, and unique insight into being where you are and what you work on is. You know, what developers are getting really excited about right now. And by that, I mean, on the one hand, certainly, you know, like startups who are actually kind of formalized and formed to startups, but also, you know, just in terms of like what people are spending their nights and weekends on what they're, you know, coming to hackathons to do.[00:39:45] NLW: And, you know, I think it's a, it's a, it's, it's such a fascinating indicator for, for where things are headed. Like if you zoom back a year, right now was right when everyone was getting so, so excited about. AI agent stuff, right? Auto, GPT and baby a GI. And these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.[00:40:07] NLW: I know because I had like a 50,000 view video, like the second day that I was doing the show on YouTube, you know, because I was talking about auto GPT. And so anyways, you know, obviously that's sort of not totally come to fruition yet, but what are some of the trends in what you guys are seeing in terms of people's, people's interest and, and, and what people are building?[00:40:24] Alessio: I can start maybe with the agents part and then I know Shawn is doing a diffusion meetup tonight. There's a lot of, a lot of different things. The, the agent wave has been the most interesting kind of like dream to reality arc. So out of GPT, I think they went, From zero to like 125, 000 GitHub stars in six weeks, and then one year later, they have 150, 000 stars.[00:40:49] Alessio: So there's kind of been a big plateau. I mean, you might say there are just not that many people that can start it. You know, everybody already started it. But the promise of, hey, I'll just give you a goal, and you do it. I think it's like, amazing to get people's imagination going. You know, they're like, oh, wow, this This is awesome.[00:41:08] Alessio: Everybody, everybody can try this to do anything. But then as technologists, you're like, well, that's, that's just like not possible, you know, we would have like solved everything. And I think it takes a little bit to go from the promise and the hope that people show you to then try it yourself and going back to say, okay, this is not really working for me.[00:41:28] Alessio: And David Wong from Adept, you know, they in our episode, he specifically said. We don't want to do a bottom up product. You know, we don't want something that everybody can just use and try because it's really hard to get it to be reliable. So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they're very good at something.[00:41:49] Alessio: Mike Conover, who was at Databricks before, is also a friend of Latentspace. He's doing this new company called BrightWave doing AI agents for financial research, and that's it, you know, and they're doing very well. There are other companies doing it in security, doing it in compliance, doing it in legal.[00:42:08] Alessio: All of these things that like, people, nobody just wakes up and say, Oh, I cannot wait to go on AutoGPD and ask it to do a compliance review of my thing. You know, just not what inspires people. So I think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very Generic agents that can do a lot of open ended tasks.[00:42:30] Alessio: And then the more business side of things is like, Hey, If I want to raise my next round, I can not just like sit around the mess, mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for, for a lot of folks in parallel, you have a lot of companies doing evals.[00:42:47] Alessio: There are dozens of them that just want to help you measure how good your models are doing. Again, if you build evals, you need to also have a restrained surface area to actually figure out whether or not it's good, right? Because you cannot eval anything on everything under the sun. So that's another category where I've seen from the startup pitches that I've seen, there's a lot of interest in, in the enterprise.[00:43:11] Alessio: It's just like really. Fragmented because the production use cases are just coming like now, you know, there are not a lot of long established ones to, to test against. And so does it, that's kind of on the virtual agents and then the robotic side it's probably been the thing that surprised me the most at NVIDIA GTC, the amount of robots that were there that were just like robots everywhere.[00:43:33] Alessio: Like, both in the keynote and then on the show floor, you would have Boston Dynamics dogs running around. There was, like, this, like fox robot that had, like, a virtual face that, like, talked to you and, like, moved in real time. There were industrial robots. NVIDIA did a big push on their own Omniverse thing, which is, like, this Digital twin of whatever environments you're in that you can use to train the robots agents.[00:43:57] Alessio: So that kind of takes people back to the reinforcement learning days, but yeah, agents, people want them, you know, people want them. I give a talk about the, the rise of the full stack employees and kind of this future, the same way full stack engineers kind of work across the stack. In the future, every employee is going to interact with every part of the organization through agents and AI enabled tooling.[00:44:17] Alessio: This is happening. It just needs to be a lot more narrow than maybe the first approach that we took, which is just put a string in AutoGPT and pray. But yeah, there's a lot of super interesting stuff going on.[00:44:27] swyx: Yeah. Well, he Let's recover a lot of stuff there. I'll separate the robotics piece because I feel like that's so different from the software world.[00:44:34] swyx: But yeah, we do talk to a lot of engineers and you know, that this is our sort of bread and butter. And I do agree that vertical agents have worked out a lot better than the horizontal ones. I think all You know, the point I'll make here is just the reason AutoGPT and maybe AGI, you know, it's in the name, like they were promising AGI.[00:44:53] swyx: But I think people are discovering that you cannot engineer your way to AGI. It has to be done at the model level and all these engineering, prompt engineering hacks on top of it weren't really going to get us there in a meaningful way without much further, you know, improvements in the models. I would say, I'll go so far as to say, even Devin, which is, I would, I think the most advanced agent that we've ever seen, still requires a lot of engineering and still probably falls apart a lot in terms of, like, practical usage.[00:45:22] swyx: Or it's just, Way too slow and expensive for, you know, what it's, what it's promised compared to the video. So yeah, that's, that's what, that's what happened with agents from, from last year. But I, I do, I do see, like, vertical agents being very popular and, and sometimes you, like, I think the word agent might even be overused sometimes.[00:45:38] swyx: Like, people don't really care whether or not you call it an AI agent, right? Like, does it replace boring menial tasks that I do That I might hire a human to do, or that the human who is hired to do it, like, actually doesn't really want to do. And I think there's absolutely ways in sort of a vertical context that you can actually go after very routine tasks that can be scaled out to a lot of, you know, AI assistants.[00:46:01] swyx: So, so yeah, I mean, and I would, I would sort of basically plus one what let's just sit there. I think it's, it's very, very promising and I think more people should work on it, not less. Like there's not enough people. Like, we, like, this should be the, the, the main thrust of the AI engineer is to look out, look for use cases and, and go to a production with them instead of just always working on some AGI promising thing that never arrives.[00:46:21] swyx: I,[00:46:22] NLW: I, I can only add that so I've been fiercely making tutorials behind the scenes around basically everything you can imagine with AI. We've probably done, we've done about 300 tutorials over the last couple of months. And the verticalized anything, right, like this is a solution for your particular job or role, even if it's way less interesting or kind of sexy, it's like so radically more useful to people in terms of intersecting with how, like those are the ways that people are actually.[00:46:50] NLW: Adopting AI in a lot of cases is just a, a, a thing that I do over and over again. By the way, I think that's the same way that even the generalized models are getting adopted. You know, it's like, I use midjourney for lots of stuff, but the main thing I use it for is YouTube thumbnails every day. Like day in, day out, I will always do a YouTube thumbnail, you know, or two with, with Midjourney, right?[00:47:09] NLW: And it's like you can, you can start to extrapolate that across a lot of things and all of a sudden, you know, a AI doesn't. It looks revolutionary because of a million small changes rather than one sort of big dramatic change. And I think that the verticalization of agents is sort of a great example of how that's[00:47:26] swyx: going to play out too.[00:47:28] Adept episode - Screen Multimodality[00:47:28] swyx: So I'll have one caveat here, which is I think that Because multi modal models are now commonplace, like Cloud, Gemini, OpenAI, all very very easily multi modal, Apple's easily multi modal, all this stuff. There is a switch for agents for sort of general desktop browsing that I think people so much for joining us today, and we'll see you in the next video.[00:48:04] swyx: Version of the the agent where they're not specifically taking in text or anything They're just watching your screen just like someone else would and and I'm piloting it by vision And you know in the the episode with David that we'll have dropped by the time that this this airs I think I think that is the promise of adept and that is a promise of what a lot of these sort of desktop agents Are and that is the more general purpose system That could be as big as the browser, the operating system, like, people really want to build that foundational piece of software in AI.[00:48:38] swyx: And I would see, like, the potential there for desktop agents being that, that you can have sort of self driving computers. You know, don't write the horizontal piece out. I just think we took a while to get there.[00:48:48] NLW: What else are you guys seeing that's interesting to you? I'm looking at your notes and I see a ton of categories.[00:48:54] Top Model Research from January Recap[00:48:54] swyx: Yeah so I'll take the next two as like as one category, which is basically alternative architectures, right? The two main things that everyone following AI kind of knows now is, one, the diffusion architecture, and two, the let's just say the, Decoder only transformer architecture that is popularized by GPT.[00:49:12] swyx: You can read, you can look on YouTube for thousands and thousands of tutorials on each of those things. What we are talking about here is what's next, what people are researching, and what could be on the horizon that takes the place of those other two things. So first of all, we'll talk about transformer architectures and then diffusion.[00:49:25] swyx: So transformers the, the two leading candidates are effectively RWKV and the state space models the most recent one of which is Mamba, but there's others like the Stripe, ENA, and the S four H three stuff coming out of hazy research at Stanford. And all of those are non quadratic language models that scale the promise to scale a lot better than the, the traditional transformer.[00:49:47] swyx: That this might be too theoretical for most people right now, but it's, it's gonna be. It's gonna come out in weird ways, where, imagine if like, Right now the talk of the town is that Claude and Gemini have a million tokens of context and like whoa You can put in like, you know, two hours of video now, okay But like what if you put what if we could like throw in, you know, two hundred thousand hours of video?[00:50:09] swyx: Like how does that change your usage of AI? What if you could throw in the entire genetic sequence of a human and like synthesize new drugs. Like, well, how does that change things? Like, we don't know because we haven't had access to this capability being so cheap before. And that's the ultimate promise of these two models.[00:50:28] swyx: They're not there yet but we're seeing very, very good progress. RWKV and Mamba are probably the, like, the two leading examples, both of which are open source that you can try them today and and have a lot of progress there. And the, the, the main thing I'll highlight for audio e KV is that at, at the seven B level, they seem to have beat LAMA two in all benchmarks that matter at the same size for the same amount of training as an open source model.[00:50:51] swyx: So that's exciting. You know, they're there, they're seven B now. They're not at seven tb. We don't know if it'll. And then the other thing is diffusion. Diffusions and transformers are are kind of on the collision course. The original stable diffusion already used transformers in in parts of its architecture.[00:51:06] swyx: It seems that transformers are eating more and more of those layers particularly the sort of VAE layer. So that's, the Diffusion Transformer is what Sora is built on. The guy who wrote the Diffusion Transformer paper, Bill Pebbles, is, Bill Pebbles is the lead tech guy on Sora. So you'll just see a lot more Diffusion Transformer stuff going on.[00:51:25] swyx: But there's, there's more sort of experimentation with diffusion. I'm holding a meetup actually here in San Francisco that's gonna be like the state of diffusion, which I'm pretty excited about. Stability's doing a lot of good work. And if you look at the, the architecture of how they're creating Stable Diffusion 3, Hourglass Diffusion, and the inconsistency models, or SDXL Turbo.[00:51:45] swyx: All of these are, like, very, very interesting innovations on, like, the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or an AI generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models.[00:52:02] swyx: And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So Instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.[00:52:22] swyx: And that is something that Midjourney confirmed today, because it was only rumored the past few months. But they confirmed today that they were looking into. So all those things are like very exciting new model architectures that are, Maybe something that we'll, you'll see in production two to three years from now.[00:52:37] swyx: So the couple of the trends[00:52:38] NLW: that I want to just get your takes on, because they're sort of something that, that seems like they're coming up are one sort of these, these wearable, you know, kind of passive AI experiences where they're absorbing a lot of what's going on around you and then, and then kind of bringing things back.[00:52:53] NLW: And then the, the other one that I, that I wanted to see if you guys had thoughts on were sort of this next generation of chip companies. Obviously there's a huge amount of emphasis. On on hardware and silicon and, and, and different ways of doing things, but, y

america god tv love ceo spotify amazon netflix world learning ai europe english google apple lessons pr magic san francisco phd friend digital chinese marvel reading data predictions elon musk microsoft events funny fortune startups white house weird economics wall street memory wall street journal reddit wars auto curious cloud gate vr singapore stanford connections mix israelis context ibm mark zuckerberg senior vice president average intel ram signal cto state of the union tigers minecraft vc adapt ipo gemini siri openai sol transformers instructors lsu clouds nvidia stability rust ux lemon patel api gi davos nsfw cisco compass luther progression b2c bro sweep d d gpt bing makes disagreement mythology ml lama github llama apis token thursday night stripe amd quran vcs llm captive devops copilot sora baldur opus sam altman anthropic embody silicon dozen gpu bobo grok altman capital one tab agi mamba generic waymo boba waze dali upfront midjourney approve ide napster cloudflare golem gdc zuck prs coliseum rag git klarna kv gpus albrecht diffusion coders deepmind gan tldr alessio boston dynamics minefields gitlab sergei fragmented suno cursor mistral ppa json gpts lex fridman ena jensen huang nox stable diffusion inflection databricks decibel a16z counterpoint mts cuda rohde adept asr chroma gtc sundar lemurian decoder iou nvidia gpus stability ai etched sram cerebros practical ai singaporeans omniverse netlify pytorch eac lamo tpu mustafa suleyman day6 vae devtools agis not safe groq nvidia gtc elicit jupyter kubecon andrej karpathy autogpt project titan personal ai hbm milind ai engineer demis neurips jeff dean marginally imbue positron nlw slido entropic nat friedman ppap simon willison lstm c300 technium mbu lpu xla latent space boba guys you look swix medex metax mxu lstms
The Daily Crunch – Spoken Edition
Profluent, spurred by Salesforce research and backed by Jeff Dean, uses AI to discover medicines

The Daily Crunch – Spoken Edition

Play Episode Listen Later Mar 27, 2024 5:47


Last year, Salesforce, the company best known for its cloud sales support software (and Slack), spearheaded a project called ProGen to design proteins using generative AI. Learn more about your ad choices. Visit megaphone.fm/adchoices

That Was The Week
And The Oscar Goes to Sora

That Was The Week

Play Episode Listen Later Feb 16, 2024 33:40


Hats Off To This Week's Contributors: @RyanMorrisonJer, @geneteare, @mgsiegler, @spyglass_feed, @saulausterlitz, @ClareMalone, @benedictevans, @mikeloukides, @ErikNaso, @kateclarktweets, @finkd, @mattbirchler, @imillhiser, @jaygoldberg, @ron_miller, @btaylor, @sierraplatform, @eladgilContents* Editorial: * Essays of the Week* AI Leads New Unicorn Creation As Ranks Of $1B Startups Swells * Behold: The Sports Streaming Bundle* 40 Years Ago, This Ad Changed the Super Bowl Forever* Is the Media Prepared for an Extinction-Level Event?* Video of the Week* AI and Everything Else - Benedict Evans from Slush* AI of the Week* The OpenAI Endgame* OpenAI Sora– The most realistic AI-generated video to date* I Was Wrong. We Haven't Reached Peak AI Frenzy.* News Of the Week* I tried Vision Pro. Here's my take* The Quest 3 is better than you might expect* The Supreme Court will decide if the government can seize control of YouTube and Twitter* Arm Results Set The World On Fire* Startup of the Week* Bret Taylor's new AI company aims to help customers get answers and complete tasks automatically* X of the Week* Elad Gil on AIEditorial: And The Oscar Goes to SoraOpenAI teased its new video creation model - Sora - this week.In doing so it released a technical report and several examples of prompts and outputs.Cautious to not over-state the end game the company said:We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.All of the videos are incredible, albeit only a minute or less each. My favorite is the Dogs in Snow video:Although the ‘Closeup Man in Glasses' is also wonderful.I mention this because the speed at which AI is addressing new fields is - in my opinion - mind-boggling. Skills that take humans decades to perfect are being learned in months and are capable of scaling to infinite outputs using words, code, images, video, and sound.It will take the advancement of robotics to tie these capabilities to physical work, but that seems assured to happen.When engineering, farming, transport, or production meets AI then human needs can be addressed directly.Sora winning an Oscar for Cinematography or in producing from a script or a book seems far-fetched. But it wasn't so long ago that a tech company doing so would have been laughable, and now we have Netflix, Amazon Prime, and Apple TV Plus regularly being nominated or winning awards.Production will increasingly be able to leverage AI.Some will say this is undermining human skills, but I think the opposite. It will release human skills. Take the prompt that produced the Dogs in Snow video:Prompt:A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.I can imagine that idea and write it down. But my skills would not allow me to produce it. Sora opens my imagination and enables me to act on it. I guess that many humans have creative ideas that they are unable to execute….up to now. Sora, DallE, and ChatGPT all focus on releasing human potential.Google released its Gemini 1.5 model this week (less than a month after releasing Gemini Ultra 1.0). Tom's Guide has a summary and analysis by Ryan MorrisonGemini Pro 1.5 has a staggering 10 million token context length. That is the amount of content it can store in its memory for a single chat or response. This is enough for hours of video or multiple books within a single conversation, and Google says it can find any piece of information within that window with a high level of accuracy.Jeff Dean, Google DeepMind Chief Scientist wrote on X that the model also comes with advanced multimodal capabilities across code, text, image, audio and video.He wrote that this means you can “interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more."In “needle-in-a-haystack” testing where they look for the needle in the vast amount of data stored in the context window, they were able to find specific pieces of information with 99.7% accuracy even with 10 million tokens of data.All of this makes it easy to understand why Kate Clark at The Information penned a piece with the title: I Was Wrong. We Haven't Reached Peak AI FrenzyI will leave this week's editorial with Ryan Morrison's observation at the end of his article:What we are seeing with these advanced multimodal models is the interaction of the digital and the real, where AI is gaining a deeper understanding of humanity and how WE see the world.Essays of the WeekAI Leads New Unicorn Creation As Ranks Of $1B Startups Swells  February 13, 2024Gené Teare @geneteareFewer startups became unicorns in 2023, but The Crunchbase Unicorn Board also became more crowded, as exits became even scarcer.That means that 10 years after the term “unicorn” was coined to denote those private startups valued at $1 billion or more, there are over 1,500 current unicorn companies globally, collectively valued at more than $5 trillion based on their most recent valuations from funding deals.All told, fewer than 100 companies joined the Unicorn Board in 2023, the lowest count in more than five years, an analysis of Crunchbase data shows.Of the 95 companies that joined the board in 2023, AI was the leading sector, adding 20 new unicorns alone. Other leading unicorn sectors in 2023 included fintech (with 14 companies), cleantech and energy (12 each), and semiconductors (nine).Based on an analysis of Crunchbase data, 41 companies joined the Unicorn Board from the U.S. and 24 from China in 2023. Other countries were in the single digits for new unicorns: Germany had four new companies, while India and the U.K. each had three.New records nonethelessDespite the slower pace of new unicorns, the Crunchbase board of current private unicorns has reached new milestones as fewer companies exited the board in 2023.The total number of global unicorns on our board reached 1,500 at the start of 2024, which takes into account the exclusion of those that have exited via an M&A or IPO transaction. Altogether, these private unicorn companies have raised north of $900 billion from investors.This year also marks a decade since investor Aileen Lee of Cowboy Ventures coined the term unicorn for private companies valued at a billion dollars or more.In a new report looking at the unicorn landscape 10 years later, Lee said she believes the unicorn phenomenon is not going away, despite a sharp downturn in venture funding in recent years. She expects more than 1,000 new companies in the U.S. alone will join the ranks in the next decade.Unicorn exitsIn 2023, 10 unicorn companies exited the board via an IPO, far fewer than in recent years. That contrasts with 20 companies in 2022 and 113 in 2021.However, M&A was more active in 2023. Sixteen unicorn companies were acquired in 2023 — up from 2022 when 11 companies were acquired and slightly down from 2021 with 21 companies exiting via an acquisition.December numbersEight new companies joined The Crunchbase Unicorn Board in December 2023. The highest monthly count last year for new unicorns was 10 and the lowest was two.Of the new unicorns, three are artificial intelligence companies. Other sectors that minted unicorns in December include fintech, cybersecurity, food and beverage, and health care.The new unicorn companies minted in December 2023 were:..MoreBehold: The Sports Streaming BundleIt just makes sense. Sports was the last thing holding together the cable TV bundle. Now it will be the start of the streaming bundle.That's my 5-minute reaction to the truly huge news that Disney, Warner, and Fox are launching a new sports streaming service, combining their various sports rights into one package. Well, presumably. The details are still quite thin at this point. Clearly, several entities were racing to this story, with both WSJ and Bloomberg claiming "scoops" by publishing paragraph-long stories with only the high level facts. I'm linking to Varietyabove, which at least has a few more details, including (canned) quotes from Bob Iger, Lachlan Murdoch, and David Zaslav.Fox Corp., Warner Bros. Discovery and Disney are set to launch a new streaming joint venture that will make all of their sports programming available under a single broadband roof, a move that will put content from ESPN, TNT and Fox Sports on a new standalone app and, in the process, likely shake up the world of TV sports.The three media giants are slated to launch the new service in the fall. Subscribers would get access to linear sports networks including ESPN, ESPN2, ESPNU, SECN, ACCN, ESPNEWS, ABC, Fox, FS1, FS2, BTN, TNT, TBS, truTV and ESPN+, as well as hundreds of hours from the NFL, NBA, MLB and NHL and many top college divisions. Pricing will be announced at a later date.Each company would own one third of the new outlet and license their sports content to it on a non-exclusive basis. The service would have a new brand and an independent management teamYes, this is essentially running the Hulu playbook of old, but only for sports content. No, that ultimately didn't end well, but Hulu had a decent enough run before egos got involved.1 Here, the egos are once again being (at least temporarily) set aside to do something obvious: make money. Sports is the one bit of content that most people watch in one form or another, live no less (hence why it was keeping the cable bundle together). And increasingly, with the rise of streaming, it was becoming impossible to figure out what game was on, where. You could get access to most games online now, but it might require buying four or five different services. And again, then finding which one the game you wanted was actually on...More40 Years Ago, This Ad Changed the Super Bowl ForeverAn oral history of Apple's groundbreaking “1984” spot, which helped to establish the Super Bowl as TV's biggest commercial showcase.By Saul AusterlitzPublished Feb. 9, 2024Updated Feb. 10, 2024Four decades ago, the Super Bowl became the Super Bowl.It wasn't because of anything that happened in the game itself: On Jan. 22, 1984, the Los Angeles Raiders defeated Washington 38-9 in Super Bowl XVIII, a contest that was mostly over before halftime. But during the broadcast on CBS, a 60-second commercial loosely inspired by a famous George Orwell novel shook up the advertising and the technology sectors without ever showing the product it promoted. Conceived by the Chiat/Day ad agency and directed by Ridley Scott, then fresh off making the seminal science-fiction noir “Blade Runner,” the Apple commercial “1984,” which was intended to introduce the new Macintosh computer, would become one of the most acclaimed commercials ever made. It also helped to kick off — pun partially intended — the Super Bowl tradition of the big game serving as an annual showcase for gilt-edged ads from Fortune 500 companies. It all began with the Apple co-founder Steve Jobs's desire to take the battle with the company's rivals to a splashy television broadcast he knew nothing about.In recent interviews, several of the people involved in creating the “1984” spot — Scott; John Sculley, then chief executive of Apple; Steve Hayden, a writer of the ad for Chiat/Day; Fred Goldberg, the Apple account manager for Chiat/Day; and Anya Rajah, the actor who famously threw the sledgehammer — looked back on how the commercial came together, its inspiration and the internal objections that almost kept it from airing. These are edited excerpts from the conversations.JOHN SCULLEY On Oct. 19, 1983, we're all sitting around in Steve [Jobs's] building, the Mac building, and the cover of Businessweek says, “The Winner is … IBM.” We were pretty deflated because this was the introduction of the IBM PCjr, and we hadn't even introduced the Macintosh yet.STEVE HAYDEN Jobs said, “I want something that will stop the world in its tracks.” Our media director, Hank Antosz, said, “Well, there's only one place that can do that — the Super Bowl.” And Steve Jobs said, “What's the Super Bowl?” [Antosz] said, “Well, it's a huge football game that attracts one of the largest audiences of the year.” And [Jobs] said, “I've never seen a Super Bowl. I don't think I know anybody who's seen a Super Bowl.”FRED GOLDBERG The original idea was actually done in 1982. We presented an ad [with] a headline, which was “Why 1984 Won't Be Like ‘1984,'” to Steve Jobs, and he didn't think the Apple III was worthy of that claim...MoreIs the Media Prepared for an Extinction-Level Event?Ads are scarce, search and social traffic is dying, and readers are burned out. The future will require fundamentally rethinking the press's relationship to its audience.Clare MaloneFebruary 10, 2024My first job in media was as an assistant at The American Prospect, a small political magazine in Washington, D.C., that offered a promising foothold in journalism. I helped with the print order, mailed checks to writers—after receiving lots of e-mails asking, politely, Where is my money?—and ran the intern program. This last responsibility allowed me a small joy: every couple of weeks, a respected journalist would come into the office for a brown-bag lunch in our conference room, giving our most recent group of twentysomethings a chance to ask for practical advice about “making it.” One man told us to embrace a kind of youthful workaholism, before we became encumbered by kids and families. An investigative reporter implored us to file our taxes and to keep our personal lives in order—never give the rich and powerful a way to undercut your journalism. But perhaps the most memorable piece of advice was from a late-career writer who didn't mince words. You want to make it in journalism, he said? Marry rich. We laughed. He didn't.I've thought a lot about that advice in the past year. A report that tracked layoffs in the industry in 2023 recorded twenty-six hundred and eighty-one in broadcast, print, and digital news media. NBC News, Vox Media, Vice News, Business Insider, Spotify, theSkimm, FiveThirtyEight, The Athletic, and Condé Nast—the publisher of The New Yorker—all made significant layoffs. BuzzFeed News closed, as did Gawker. The Washington Post, which lost about a hundred million dollars last year, offered buyouts to two hundred and forty employees. In just the first month of 2024, Condé Nast laid off a significant number of Pitchfork's staff and folded the outlet into GQ; the Los Angeles Times laid off at least a hundred and fifteen workers (their union called it “the big one”); Time cut fifteen per cent of its union-represented editorial staff; the Wall Street Journal slashed positions at its D.C. bureau; and Sports Illustrated, which had been weathering a scandal for publishing A.I.-generated stories, laid off much of its staff as well. One journalist recently cancelled a networking phone call with me, writing, “I've decided to officially take my career in a different direction.” There wasn't much I could say to counter that conclusion; it was perfectly logical.“Publishers, brace yourselves—it's going to be a wild ride,” Matthew Goldstein, a media consultant, wrote in a January newsletter. “I see a potential extinction-level event in the future.” Some of the forces cited by Goldstein were already well known: consumers are burned out by the news, and social-media sites have moved away from promoting news articles. But Goldstein also pointed to Google's rollout of A.I.-integrated search, which answers user queries within the Google interface, rather than referring them to outside Web sites, as a major factor in this coming extinction. According to a recent Wall Street Journalanalysis, Google generates close to forty per cent of traffic across digital media. Brands with strong home-page traffic will likely be less affected, Goldstein wrote—places like Yahoo, the Wall Street Journal, the New York Times, the Daily Mail, CNN, the Washington Post, and Fox News. But Web sites that aren't as frequently typed into browsers need to “contemplate drastic measures, possibly halving their brand portfolios.”What will emerge in the wake of mass extinction, Brian Morrissey, another media analyst, recently wrote in his newsletter, “The Rebooting,” is “a different industry, leaner and diminished, often serving as a front operation to other businesses,” such as events, e-commerce, and sponsored content. In fact, he told me, what we are witnessing is nothing less than the end of the mass-media era. “This is a delayed reaction to the commercial Internet itself,” he said. “I don't know if anything could have been done differently.”..Much MoreVideo of the WeekAI and Everything Else - Benedict Evans from SlushAI of the WeekThe OpenAI EndgameThoughts about the outcome of the NYT versus OpenAI copyright lawsuitBy Mike LoukidesFebruary 13, 2024Since the New York Times sued OpenAI for infringing its copyrights by using Times content for training, everyone involved with AI has been wondering about the consequences. How will this lawsuit play out? And, more importantly, how will the outcome affect the way we train and use large language models?There are two components to this suit. First, it was possible to get ChatGPT to reproduce some Times articles very close to verbatim. That's fairly clearly copyright infringement, though there are still important questions that could influence the outcome of the case. Reproducing the New York Times clearly isn't the intent of ChatGPT, and OpenAI appears to have modified ChatGPT's guardrails to make generating infringing content more difficult, though probably not impossible. Is this enough to limit any damages? It's not clear that anybody has used ChatGPT to avoid paying for a NYT subscription. Second, the examples in a case like this are always cherry-picked. While the Times can clearly show that OpenAI can reproduce some articles, can it reproduce any article from the Times' archive? Could I get ChatGPT to produce an article from page 37 of the September 18, 1947 issue? Or, for that matter, an article from the Chicago Tribune or the Boston Globe? Is the entire corpus available (I doubt it), or just certain random articles? I don't know, and given that OpenAI has modified GPT to reduce the possibility of infringement, it's almost certainly too late to do that experiment. The courts will have to decide whether inadvertent, inconsequential, or unpredictable reproduction meets the legal definition of copyright infringement.The more important claim is that training a model on copyrighted content is infringement, whether or not the model is capable of reproducing that training data in its output. An inept and clumsy version of this claim was made by Sarah Silverman and others in a suit that was dismissed. The Authors' Guild has its own version of this lawsuit, and it is working on a licensing model that would allow its members to opt in to a single licensing agreement. The outcome of this case could have many side-effects, since it essentially would allow publishers to charge not just for the texts they produce, but for how those texts are used.It is difficult to predict what the outcome will be, though easy enough guess. Here's mine. OpenAI will settle with the New York Times out of court, and we won't get a ruling. This settlement will have important consequences: it will set a de-facto price on training data. And that price will no doubt be high. Perhaps not as high as the Times would like (there are rumors that OpenAI has offered something in the range of $1 million to $5 million), but sufficiently high enough to deter OpenAI's competitors.$1M is not, in and of itself, a terribly high price, and the Times reportedly thinks that it's way too low; but realize that OpenAI will have to pay a similar amount to almost every major newspaper publisher worldwide in addition to organizations like the Authors Guild, technical journal publishers, magazine publishers, and many other content owners. The total bill is likely to be close to $1 billion, if not more, and as models need to be updated, at least some of it will be a recurring cost. I suspect that OpenAI would have difficulty going higher, even given Microsoft's investments—and, whatever else you may think of this strategy—OpenAI has to think about the total cost. I doubt that they are close to profitable; they appear to be running on an Uber-like business plan, in which they spend heavily to buy the market without regard for running a sustainable business. But even with that business model, billion-dollar expenses have to raise the eyebrows of partners like Microsoft.The Times, on the other hand, appears to be making a common mistake: overvaluing its data. Yes, it has a large archive—but what is the value of old news? Furthermore, in almost any application but especially in AI, the value of data isn't the data itself; it's the correlations between different datasets. The Times doesn't own those correlations any more than I own the correlations between my browsing data and Tim O'Reilly's. But those correlations are precisely what's valuable to OpenAI and others building data-driven products...MoreOpenAI Sora– The most realistic AI-generated video to dateERIK NASOOpenAI Sora is an AI text-to-video model that has achieved incredibly realistic video that is hard to tell it is AI. It's very life-like but not real. I think we have just hit the beginning of some truly powerful AI-generated video that could change the game for stock footage and more. Below are two examples of the most realistic AI prompt-generated videos I have seen.Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff's edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user's prompt. OpenAI SOra states they are teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction...MoreI Was Wrong. We Haven't Reached Peak AI Frenzy.By Kate ClarkFeb 15, 2024, 4:16pm PSTAfter Sam Altman's sudden firing last year, I argued the chaos that followed his short-lived ouster would inject a healthy dose of caution into venture investments in artificial intelligence companies. I figured we'd finally reached the peak of the AI venture capital frenzy when a threatened employee exodus from OpenAI risked sending the value of the $86 billion AI juggernaut almost to zero. There was plenty of other proof that the hype for generative AI was fading. Investors were openly saying they planned to be a lot tougher on valuation negotiations and would ask startups harder questions about governance. Some companies had begun to consider selling themselves due to the high costs of developing AI software. And an early darling of the AI boom, AI-powered writing tool Jasper, had become the butt of jokes when it slashed internal revenue projections and cut its internal valuation after having won a $1.5 billion valuation in 2022. I forgot that everyone in Silicon Valley suffers from short-term memory loss. After a week sipping boxed water with venture capitalists from South Park to Sand Hill Road, I'm convinced I called the end of the AI frenzy far too soon. In fact, I expect this year will deliver more cash into the hands of U.S. AI startups than last year, when those companies raised a total of $63 billion, according to PitchBook data. Altman's fundraising ambitions will surely boost the total. A recent report from The Wall Street Journal said Altman plans to raise trillions of dollars to develop the AI chips needed to create artificial general intelligence, software that can reason the way humans do. Even if that number is actually much smaller, talk of such goals lifts the ceiling for other startup founders, who are  likely to think even bigger and to be more aggressive in their fundraising. Investor appetite for AI companies is still growing, too. These investors claimed last fall that they were done with the FOMO-inspired deals, but they're pushing checks on the top AI companies now harder than ever...MoreNews Of the WeekI tried Vision Pro. Here's my takeThe Quest 3 is better than you might expectPosted by Matt Birchler13 Feb 2024Alex Heath for The Verge: Zuckerberg says Quest 3 is “the better product” vs. Apple's Vision ProHe says the Quest has a better “immersive” content library than Apple, which is technically true for now, though he admits that the Vision Pro is a better entertainment device. And then there's the fact that the Quest 3 is, as Zuck says, “like seven times less expensive.”I currently own both headsets and while I'm very excited about the potential in the Vision Pro, I actually find it hard to fully disagree with Zuck on this one. I think a lot of people have only used the Vision Pro would be surprised how well the Quest 3 does some things in comparison.For example, the pass-through mode is definitely not quite as good as the Vision Pro's, but it's closer than you might expect. And while people are rightly impressed with how well the Vision Pro has windows locked in 3D space, honestly the Quest 3 is just as good at this in my experience. When it comes to comfort, I do think the Vision Pro is easier to wear for longer periods, but I find it more finicky to get in just the right spot in front of my eyes, while the Quest 3 seems to have a larger sweet spot. And let's not even talk about the field of view, which is way wider on the Quest to the point of being unnoticeable basically all the time. I kinda think field of view will be similar to phone bezels in that you get used to what you have and anything more seems huge — you can get used to the Vision Pro's narrower field of view, but once you're used to wider, it's hard to not notice when going back.The Vision Pro has some hardware features that help it rise above (the massively higher resolution screen jumps to mind), but I'm just saying that if you're looking for everything to be 7x better to match the price difference, I don't think that's there.Beyond this, the products are quite different, though. As Zuckerberg says, the Quest 3 is more focused on fully immersive VR experiences, and while the Vision Pro has a little of that right now, it's not really doing the same things. And when it comes to gaming it's not even close. The Quest 3 has a large library of games available and that expands to almost every VR game ever made with Steam Link.On the other hand, the Vision Pro is much for a “computer” than the Quest ever was. If you can do it on a Mac or an iPad, you can probably already do it on the Vision Pro. And I'm not talking about finding some weird alternate version of your task manager or web browser that doesn't sync with anything else in your life, I'm talking about the apps you already know and love. This is huge and it's Apple leveraging its ecosystem to make sure you can seamlessly move from Mac to iPhone to iPad to Vision Pro. And if you can't install something from the App Store, the web browser is just as capable as Safari on the iPad. If all else fails, you can always just bring your full Mac into your space as well. I will say the Quest 3 can do this and has the advantage of working with Windows as well, but if you have a Mac, it's much, much better.This is more words than I expected to write about a CEO saying his product is better than the competition's (shocker), but I do think that Zuck's statement is less insane than some may think it to be...MoreThe Supreme Court will decide if the government can seize control of YouTube and TwitterWe're about to find out if the Supreme Court still believes in capitalism.By Ian Millhiser Feb 15, 2024, 7:00am ESTIan Millhiser is a senior correspondent at Vox, where he focuses on the Supreme Court, the Constitution, and the decline of liberal democracy in the United States. He received a JD from Duke University and is the author of two books on the Supreme Court.In mid-2021, about a year before he began his longstanding feud with the biggest employer in his state, Florida's Republican Gov. Ron DeSantis signed legislation attempting to seize control of content moderation at major social media platforms such as YouTube, Facebook, or Twitter (now called X by Elon Musk). A few months later, Texas Gov. Greg Abbott, also a Republican, signed similar legislation in his state.Both laws are almost comically unconstitutional — the First Amendment does not permit the government to order media companies to publish content they do not wish to publish — and neither law is currently in effect. A federal appeals court halted the key provisions of Florida's law in 2022, and the Supreme Court temporarily blocked Texas's law shortly thereafter (though the justices, somewhat ominously, split 5-4 in this later case).Nevertheless, the justices have not yet weighed in on whether these two unconstitutional laws must be permanently blocked, and that question is now before the Court in a pair of cases known as Moody v. NetChoice and NetChoice v. Paxton.The stakes in both cases are quite high, and the Supreme Court's decision is likely to reveal where each one of the Republican justices falls on the GOP's internal conflict between old-school free market capitalists and a newer generation that is eager to pick cultural fights with business...MoreArm Results Set The World On FireFebruary 13, 2024 · by D/D Advisors · in Analyst Decoder Ring. ·Arm reported its second set of earnings as a (once again) public company last week. These numbers were particularly strong, well above consensus for both the current and guided quarters. Arm stock rallied strongly on the results up ~30% for the week. These numbers were important as they go a long way to establishing the company's credibility with the Street in a way their prior results did not.That being said, we saw things we both liked and disliked in their numbers. Here are our highlights of those:Positive: Growing Value Capture. One of our chief concerns with the company since IPO has been the low value they capture per licensed chip shipped – roughly $0.11 per chip at the IPO. That figure continued to inch higher in the latest results, but critically they pointed out that their royalty rate doubles with the latest version of their IP (v9). This does not mean that all of their royalty rates are going to double any time soon, but it does point very much in the right direction. Critically, they noted this rate increase applies to architectural licenses as well.Negative: The Model is Complex. Judging from the number of questions management fielded on the call about this rate increase no one really knows how to model Arm. The company has a lot of moving parts in its revenue mix, and they have limits to their ability to communicate some very important parts of their model. We think that at some point the company would be well served by providing some clearer guide posts on how to build these models or they risk the Street always playing catch up with a wide swing of expectations each quarter.Positive: Premium Plan Conversion. The company said three companies converted from their AFA plan to the ATA model. We will not get into the details of those here, but these can best be thought of in software terms with customers on low priced subscription plans converting to Premium subscription plans. This is a good trend, and management expressed a high degree of confidence that they expect to see it continue. They have spent a few years putting these programs in place and seem to have thought them through. This matters particularly because these programs are well suited for smaller, earlier-stage companies. The old Arm struggled to attract new customers in large part because of the high upfront costs of Arm licenses. Programs like AFA and ATA could go a long way to redressing those past wrongs.Negative: China remains a black box. Arm China is of course a constant source of speculation. In the latest quarter it looks like a large portion of growth came from China which does not exactly square with other data coming from China right now. It is still unclear to us how much of Arm's revenues from China's handset companies gets booked through Arm China as a related party transaction and how much is direct. Investors are confused too. There is no easy solution to this problem, digging too hard into Arm China's numbers is unlikely to make anyone happy with the answers, but hopefully over time it all settles down.Positive: Growing Complexity of Compute. Management repeatedly mentioned this factor, noting that this leads to more chips and more Arm cores shipping in the marketplace. Some of this is tied to AI, but we think the story is broader than that. It is going to be tempting to see much of Arm's growth as riding the AI wave, but this does not fully capture the situation. The AI story is largely about GPUs, which are not particularly heavy with Arm cores. But those GPUs still need some CPU attach, and AI accelerators can sometimes be good Arm targets.Negative: Diversification. Arm remains heavily dependent on smartphones, and we suspect the return to inventory stocking by handset makers is playing a big role in their guidance. When asked about segmentation of their results the company declined to update the model provided during the IPO. We hope to see some diversification here when they do update their figures later in the year.Overall, the company did a good job in the quarter. They still have some kinks to work out with their communication to the Street, but this was a good second step as a public company...MoreStartup of the WeekBret Taylor's new AI company aims to help customers get answers and complete tasks automaticallyRon Miller @ron_miller / 6:36 AM PST•February 13, 2024Image Credits: mi-vector / Getty ImagesWe've been hearing about former Salesforce co-CEO Bret Taylor's latest gig since he announced he was leaving the CRM giant in November 2022. Last February we heard he was launching an AI startup built with former Google employee Clay Bavor. Today, the two emerged with a new conversational AI company called Sierra with some bold claims about what it can do.At its heart, the new company is a customer service bot. That's not actually all that Earth-shattering, but the company claims that it's much more than that, with its software going beyond being an extension of a FAQ page and actually taking actions on behalf of the customer.“Sierra agents can do so much more than just answer questions. They take action using your systems, from upgrading a subscription in your customer database to managing the complexities of a furniture delivery in your order management system. Agents can reason, problem solve and make decisions,” the company claimed in a blog post.Having worked with large enterprise customers at Salesforce, Taylor certainly understands that issues like hallucinations, where a large language model sometimes makes up an answer when it lacks the information to answer accurately, is a serious problem. That's especially true for large companies, whose brand reputation is at stake. The company claims that it is solving hallucination issues.Image Credits: SierraAt the same time, it's connecting to other enterprise systems to undertake tasks on behalf of the customer without humans being involved. These are both big audacious claims and will be challenging to pull off...MoreX of the Week This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit thatwastheweek.substack.com/subscribe

united states tv ceo spotify time netflix texas ai google earth china disney apple internet washington guide nfl dogs sports super bowl nba germany new york times video management elon musk microsoft iphone fortune 3d abc chatgpt uber espn court cnn supreme court mlb quest skills tokyo winner silicon valley web republicans discovery cbs wall street journal production investors snow washington post vr amazon prime mac nhl hulu programs brands ipads windows complex fox news new yorker constitution fomo ip ibm pricing athletic steve jobs yahoo bloomberg publishers jd crm ipo gemini openai marry judging gop arm salesforce blade runner nyt won warner duke university app store business insider ron desantis sports illustrated moody south park ridley scott tnt years ago los angeles times ads safari first amendment vox guild sixteen gq gpt glasses faq nbc news boston globe george orwell chicago tribune daily mail essays goldstein sora tbs greg abbott altman bob iger cpu businessweek dalle cautious pitchfork critically cinematography zuck gpus macintosh conceived sarah silverman rebooting big sur fs1 ata trutv gawker fivethirtyeight compute buzzfeed news afa cond nast vice news apple tv plus oscar goes espn2 vox media american prospect crunchbase reproducing btn pacific coast highway espnu pitchbook authors guild extinction level event theskimm be like john sculley los angeles raiders i was wrong jeff dean kate clark weeki cowboy ventures clare malone chiat day ryan morrison aileen lee gemini ultra brian morrissey super bowl xviii fs2 espnews apple iii
Pivot Point - Success is a Journey!
S3 Ep 2 Shifting Priorities - Finding Passion

Pivot Point - Success is a Journey!

Play Episode Listen Later Feb 5, 2024 27:11


Jeff Dean has a story to tell about being a seasoned professional who found himself in job search through no fault of his own. In this episode, you will hear ways to navigate through the waters of being labeled "over qualified" which is a euphemism for age bias to using your skills and persistence to find a better quality of life. Take Aways: - Participate (not just join) job seeker groups - Networking is key, and more productive than ever before - Use time wisely and consider volunteering - Age bias is real, so be prepared to demonstrate your value over younger workers (e.g. loyalty, experience) To network with Jeff Dean, connect with him on LinkedIn linkedin.com/in/jeffreyjdean Related articles: Forbes: https://www.forbes.com/sites/carolinecastrillon/2022/09/25/how-to-combat-ageism-during-a-job-search/ Indeed: https://www.indeed.com/career-advice/finding-a-job/ageism-in-hiring

Minimum Competence
Legal News for Weds 1/24 - $62M Ivy League Settlement, Google's AI Patent Case Resolved, Spellbook Raises $20m and new SEC Rules Target SPACs

Minimum Competence

Play Episode Listen Later Jan 24, 2024 8:37


This Day in Legal History: Thurgood Marshall DiesOn January 24, 1993, the United States lost one of its most influential legal figures, retired Supreme Court Justice Thurgood Marshall. His death in Bethesda, Maryland, marked the end of an era in American jurisprudence. Born on July 2, 1908, in Baltimore, Maryland, Marshall's journey to becoming the first African American Supreme Court Justice was paved with groundbreaking legal battles and an unwavering commitment to civil rights.Before his appointment to the Supreme Court in 1967 by President Lyndon B. Johnson, Marshall had already made a significant impact as a lawyer. He served as the chief counsel for the National Association for the Advancement of Colored People (NAACP), where he strategized and won a series of critical court cases that chipped away at the legal foundations of racial segregation. His most famous case, Brown v. Board of Education of Topeka in 1954, ended with the Supreme Court's unanimous decision declaring state laws establishing separate public schools for black and white students to be unconstitutional, effectively overturning the "separate but equal" doctrine of Plessy v. Ferguson.During his tenure on the Supreme Court, Marshall became known for his passionate advocacy for individual rights and his opposition to the death penalty. His legal opinions, both majority and dissenting, reflected his deep-seated belief in equality and justice for all. He often stressed the importance of viewing the Constitution as a dynamic, living document, capable of adapting to changing societal needs and values.Marshall's impact extends beyond his legal victories; he paved the way for greater diversity in the legal profession and on the bench. His life and career remain a testament to the power of the law as a tool for social change and continue to inspire generations of lawyers, activists, and citizens.His death was not just the loss of a great legal mind but also the end of an era that saw significant strides in civil rights and social justice. As we remember Justice Thurgood Marshall on this day, his legacy serves as a reminder of the ongoing struggle for equality and the enduring power of dedicated individuals to bring about change in society.Brown, Yale, and Columbia universities, along with Emory and Duke, have agreed to pay a total of $62 million to settle a lawsuit accusing them of favoring wealthy applicants, bringing the total settlements in the case to $118 million. This lawsuit, filed against several U.S. universities, alleges they conspired to restrict financial aid and violated a pledge to not consider students' financial status in admissions, effectively giving an advantage to wealthy students. The universities, including those that have settled, deny any wrongdoing. The settlements vary, with Yale and Emory paying $18.5 million each, Brown $19.5 million, and Columbia and Duke $24 million each. The lawsuit, still involving 10 other universities like Cornell and the University of Pennsylvania, is pending approval from U.S. District Judge Matthew Kennelly, who previously declined to dismiss the case in 2022.Brown, Yale, Columbia among latest to settle financial-aid lawsuit | ReutersGoogle has settled a patent infringement lawsuit with Singular Computing, averting a trial that was set to begin with closing arguments. The lawsuit, filed in 2019, sought $1.67 billion in damages, accusing Google of misusing Singular's computer-processing innovations in its artificial intelligence (AI) technology. Singular, founded by Joseph Bates, alleged that Google incorporated its technology into processing units used in various Google services like Google Search, Gmail, and Google Translate.The dispute centered around Google's Tensor Processing Units (TPUs), introduced in 2016 to enhance AI capabilities in tasks like speech recognition and ad recommendation. Singular claimed that the second and third versions of these units, released in 2017 and 2018, infringed on its patents. According to the lawsuit, Bates shared his inventions with Google between 2010 and 2014, suggesting that Google's TPUs copied his technology.Internal emails revealed during the trial indicated Google's interest in Bates' ideas, with the company's now-chief scientist, Jeff Dean, acknowledging their potential utility. However, Google maintained that its employees who designed the TPUs had never met Bates and developed the technology independently, arguing that its tech was fundamentally different from what was described in Singular's patents.Details of the settlement have not been disclosed, and representatives from both Google and Singular have confirmed the settlement without providing further information. Google spokesperson Jose Castaneda expressed satisfaction with the resolution, emphasizing that Google did not violate Singular's patent rights.Google settles AI-related chip patent lawsuit that sought $1.67 bln | ReutersGoogle Settles AI Chip-Design Suit That Had Sought BillionsSpellbook, a Canada-based legal software company specializing in contract management, has secured $20 million in Series A funding, led by Montreal's Inovia Capital. Other investors include The Legaltech Fund, Bling Capital, and Thomson Reuters Ventures. The company's product, built on OpenAI's GPT-4, assists corporate and commercial lawyers with contract drafting and review by suggesting language and negotiation points.The legal AI sector is experiencing a surge in investment as startups introduce tools designed to integrate generative AI into legal processes. However, the market remains highly competitive with no clear leader yet emerging. Spellbook's CEO, Scott Stevenson, highlighted the company's focus on serving small to midsize law firms and solo practitioners, though it has also attracted larger firms and in-house legal teams.Spellbook's clientele includes diverse organizations such as Addleshaw Goddard, KMSC Law, Carbon Chemistry, and ATEM Capital. The company, initially named Rally at its inception in 2019, rebranded to Spellbook after a $10.9 million seed round in June 2023 and shifted focus from automating routine legal tasks to AI-driven contract management.The legal technology sector is witnessing increased investor interest, particularly since the advent of generative AI technologies. Other firms in the sector, such as Norm AI and Robin AI, have also recently raised substantial funding, indicating a growing trend in the investment and development of legal AI tools.Legal AI startup Spellbook raises $20 mln as sector draws more investments | ReutersThe Securities and Exchange Commission (SEC) has introduced new rules for deals involving special purpose acquisition companies (SPACs), aiming to enhance investor protections and align these transactions more closely with traditional initial public offerings (IPOs). This regulatory change comes as SPACs, which surged in popularity during the COVID-19 pandemic as an alternative public listing method, have recently lost favor. The new rules revoke certain legal protections previously afforded to SPAC sponsors, making them more susceptible to lawsuits over exaggerated statements. These regulations also demand increased disclosures, particularly concerning forward-looking projections in the later stages of SPAC deals.By way of very brief background, a SPAC is an alternative to the traditional IPO process for a company seeking to go public. A SPAC is essentially a shell company that raises funds through an IPO with the sole intent of acquiring or merging with an existing private company, thereby taking that company public. Unlike traditional IPOs, where a company goes public based on its own assets and operations, a SPAC has no commercial operations and is created solely for the purpose of acquiring a private company. This process allows the target company to become publicly traded more quickly and with potentially less regulatory scrutiny than the traditional IPO route. Additionally, SPACs offer more certainty regarding valuation and financing compared to traditional IPOs.SEC Chair Gary Gensler emphasized the need for robust investor protections, regardless of the method used for going public. The SEC's heightened scrutiny and macroeconomic factors like rising interest rates have already cooled the once-booming SPAC market. Major financial institutions such as Goldman Sachs and Bank of America reduced their involvement in SPACs following the SEC's initial proposal of these changes.The SEC's new requirements include detailed disclosures from SPAC sponsors about potential conflicts of interest, compensation, and dilution. Companies targeted by SPACs must now register with the SEC and fulfill additional disclosure obligations before merging. Furthermore, these target companies are now jointly liable for the information shared with investors and must provide independently audited financial statements. The SEC's Republican Commissioner Mark Uyeda criticized the rules as overly burdensome, suggesting they might effectively end the SPAC market. The new regulations will take effect in over four months, with additional financial reporting and accounting requirements for SPAC transactions also being implemented.SEC Imposes New Rules on Blank-Check Deals as SPACs Fizzle (2) Get full access to Minimum Competence - Daily Legal News Podcast at www.minimumcomp.com/subscribe

Hesby Street
The Best of Hesby Street - Volume Three

Hesby Street

Play Episode Listen Later Jan 4, 2024 22:37


The best of Hesby Street featuring favorites like Torio's Fake Wife, Jeff Dean's Failed Proposal and Zack's John Wayne! ALL EVERYTHING IS NOTHING!Follow us on Instagram:@hesbystreetpod@toriovangrol@zackchapaloniKeep up with everything Hesby Street at:https://www.hesbystreetpod.com Get bonus content on Patreon Hosted on Acast. See acast.com/privacy for more information.

Off The Charts Football Podcast
Announcing the 2023 SIS All-America Teams

Off The Charts Football Podcast

Play Episode Listen Later Dec 21, 2023 20:43


On this episode of the Off The Charts Football Podcast, Matt Manocherian (@mattmano) welcomes Bryce Rossler (@btrossler) and James Weaver (@J_Weaver97) of the SIS Research team along with Nathan Cooper (@ncoopdraft) and Jeff Dean of the Football Ops crew to the show to discuss this year's SIS All-America Teams. The guys discuss the process that went into picking the players and some of the more controversial picks to make (or not make) the list. You can find the entire First-Team, Second-Team, and Honorable Mentions on the SIS website at "2023 SIS NCAA All-America Teams".Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

KFFN Spears & Ali
S&A Talk Athletic and Academic Greatness with the Bourguet Family, Arizona Football and Basketball with Jeff Dean, and Arizona Basketball with Bruce Pascoe

KFFN Spears & Ali

Play Episode Listen Later Dec 14, 2023 48:00


Off The Charts Football Podcast
The True 9s of the NFL

Off The Charts Football Podcast

Play Episode Listen Later Dec 7, 2023 46:16


 On this episode of the Off The Charts Football Podcast, host Matt Manocherian (@mattmano) and regular guests Alex Vigderman (@VigManOnCampus), Bryce Rossler (@btrossler), and James Weaver (@J_Weaver97) of the SIS Research Team welcome Nathan Cooper (@ncoopdraft), Jordan Edwards (@babyrhino79), Ben Hrkach and Jeff Dean of the SIS Football Scouting operation to the show for a discussion about elite traits in the NFL from both the scouting and statistical viewpoints. The guys dive into a scouting debate about which NFL players possess a truly elite trait based on the 1-9 scale that our scouts use at SIS and what the data can tell us about those traits. Some of the players mentioned include Aaron Donald (6:58), Tyreek Hill (11:12), Trent Williams (15:50), Josh Allen (21:39), Patrick Mahomes (27:02), Lamar Jackson (35:10), and Chris Jones (39:45). Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

Wildcat Country
Territorial Cup champs (again)! Recap and a look ahead with Jeff Dean

Wildcat Country

Play Episode Listen Later Nov 27, 2023 62:46


The Arizona Wildcats just enjoyed their biggest-ever Territorial Cup victory in Tempe to end the regular season on a six-game winning streak. Wildcat PA announcer Jeff Dean joins us to discuss the football program's rise and look ahead to their bowl game and beyond. Plus, thoughts on the men's basketball team's big Thanksgiving victory, and predictions for championship weekend.

Wildcat Country
11/14/23: Arizona vs. Utah preview and basketball recap with Jeff Dean

Wildcat Country

Play Episode Listen Later Nov 15, 2023 59:40


What a weekend for Arizona athletics! Wildcats football and men's basketball PA announcer Jeff Dean joins us to recap big wins over Duke and Colorado, and preview the football team's home finale vs. Utah. Plus, should Wildcat fans be worried that Jedd Fisch might take another job? Finally, we make our predictions for this weekend's biggest football games, including Wildcats-Utes.

Wildcat Country
Arizona vs. Washington preview with Jeff Dean

Wildcat Country

Play Episode Listen Later Sep 27, 2023 50:29


The Arizona Wildcats escaped Stanford with a win, but they have questions at quarterback as they head into their toughest challenge of the season vs. No. 7 Washington. Who will (and should) get the start at QB, and can the Wildcats pull the upset? Arizona football PA announcer Jeff Dean joins us to preview the game. Plus, we make picks for some of the weekend's biggest games, including UA vs. U-Dub.

Wildcat Country
8/6/23: Arizona-Big 12 Roundtable Discussion with Jeff Dean and Blair Willis

Wildcat Country

Play Episode Listen Later Aug 7, 2023 52:48


The Arizona Wildcats are headed to the Big 12 in 2024. What should fans be most excited about? What concerns should they have? How will this impact football recruiting? Who will the Wildcats' biggest rivals be? And will this move help expand The University of Arizona brand nationally? We discuss all of these questions and more with a pair of Wildcat experts, Arizona PA announcer Jeff Dean and former UA SID Blair Willis.

Off The Charts Football Podcast
Scouts vs Stats: Top 10 Tight Ends in the NFL

Off The Charts Football Podcast

Play Episode Listen Later Jun 22, 2023 56:13


On this episode of the Off The Charts Football Podcast, Alex Vigderman (@VigManOnCampus) fills in for host Matt Manocherian as he welcomes Bryce Rossler (@btrossler) and James Weaver (@J_Weaver97) of the SIS Research Team along with special guest Jeff Dean of the Football Ops staff to the show for another installation of "Scouts vs Stats". The jocks and the nerds find some common ground this week as they tackle the top tight ends in the NFL. Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

Off The Charts Football Podcast
Scouts vs Stats: Top 10 Running Backs in the NFL

Off The Charts Football Podcast

Play Episode Listen Later Jun 15, 2023 59:54


On this episode of the Off The Charts Football Podcast, Alex Vigderman (@VigManOnCampus) fills in for host Matt Manocherian as he welcomes Bryce Rossler (@btrossler) and James Weaver (@J_Weaver97) of the SIS Research Team along with special guest Jeff Dean of the Football Ops staff to the show for another installation of "Scouts vs Stats".  Running backs definitely matter this week as the guys will rank the Top 10 in the NFL, with Alex and James maintaining their positions on "Team Stats" while Jeff joins Bryce on the scouting side of things.Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

Off The Charts Football Podcast
NFL Draft Recap: Winners, Losers, Grades, and More

Off The Charts Football Podcast

Play Episode Listen Later May 1, 2023 36:38


On this episode of the Off The Charts Football Podcast, fill-in host Alex Vigderman (@VigManOnCampus), the Director of Football Analytics at Sports Info Solutions, welcomes Nathan Cooper (@ncoopdraft) and Jeff Dean of the SIS Scouting Staff to the show to recap the 2023 NFL Draft. The guys discuss how we grade drafts (1:03) before looking at the rankings list, starting at the top with teams like the Panthers, Eagles, Texans, Dolphins, and Patriots (2:27). They then transition to how our scouting grades performed relative to the picks (16:39), which teams didn't draft as well based on our grades (21:02), and other significant notes to take away (25:49). Check out the "2023 NFL Draft Team Grades" article for a comprehensive draft breakdown.Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

Off The Charts Football Podcast
NFL Draft: First Round Recap

Off The Charts Football Podcast

Play Episode Listen Later Apr 28, 2023 32:16


On this episode of the Off The Charts Football Podcast, fill-in host Alex Vigderman (@VigManOnCampus), the Director of Football Analytics at Sports Info Solutions, welcomes Nathan Cooper (@ncoopdraft) and Jeff Dean of the SIS Scouting Staff to the show to recap the first round of the 2023 NFL Draft. The guys discuss some of the highlights from Thursday night, including the Texans trading up for Will Anderson (2:31), the Lions making some curious choices (9:09), the Eagles grabbing Jalen Carter (13:48), the run on receivers in the 20s (19:55), the Eagles getting a steal at 30 with Nolan Smith (24:49), and some players to keep an eye on early in Round 2 (28:42). Stay tuned for a full draft recap and early draft grades coming in a special episode on Monday.Thank you for listening. Please check out The Edge, The Trenches Tool, the SIS NFL Draft Website and SportsInfoSolutions.com for all our latest content, and don't forget to check out the SIS Baseball Podcast, available wherever you get your podcasts.

The Learning Leader Show With Ryan Hawk
470: Daniel Coyle - Building Your Culture, Solving Hard Problems, & Winning The Learning Contest

The Learning Leader Show With Ryan Hawk

Play Episode Listen Later May 1, 2022 75:02 Very Popular


Text Hawk to 66866 for "Mindful Monday." It's a carefully curated email to help you start your work off on a high note. Full show notes at www.LearningLeader.com Twitter/IG: @RyanHawk12    https://twitter.com/RyanHawk12 Daniel Coyle is the New York Times bestselling author of The Culture Code, which was named Best Business Book of the Year by Bloomberg, BookPal, and Business Insider. Coyle has served as an advisor to many high-performing organizations, including the Navy SEALs, Microsoft, Google, and the Cleveland Guardians. His other books include The Talent Code, The Secret Race, The Little Book of Talent, and Hardball: A Season in the Projects, which was made into a movie starring Keanu Reeves. Coyle was raised in Anchorage, Alaska, and now lives in Cleveland Heights, Ohio, during the school year and in Homer, Alaska, during the summer with his wife Jenny, and their four children. Notes: Purpose isn't about tapping into some mystical internal drive but rather about creating simple beacons that focus attention and engagement on the shared goal. Successful cultures do this by relentlessly seeking ways to tell and retell their story. To do this, they build what you call “high-purpose environments.” High-purpose environments are filled with small, vivid signals designed to create a link between the present moment and a future ideal. They provide 2 simple locators that every navigation process requires: Here is where we are and Here is where we want to go. "The world we live in is a learning contest." Deep fun = Solving hard problems with people you admire. Schedule regular team “tune-ups” to place an explicit spotlight on the team's inner workings and create conversations that surface and improve team dynamics Foster strong culture in remote working scenarios. It doesn't take much physical togetherness to build strong teams. Encourage remote teams meet up in person twice a year Create belonging: every group knows diversity, equity, and inclusion matter, but what separates strong cultures is they aim to create belonging across racial lines. Ex: normalize uncomfortable conversations; read, watch, reflect together; gather data and share it • Build Trust. Ask the magic-wand question to each member of your team: if you could wave a magic wand and change one thing about the way we work, what would it be? Connect. Hold an anxiety party to serve as a pressure-relief valve, as well as a platform for people to connect and solve problems together. Change perspective. Have a once-a-week catch-up session with someone outside of your group. Make it safe to talk about mistakes: Strong cultures seek to highlight and remember their mistakes and learn from them • Listen. Listening to others' problems is one of the most powerful culture-building skills on the planet. It's also difficult. Restrain yourself from jumping in, listen, then say: Tell me more. Embrace the After-Action Review (or as the military calls it, the AAR): Talking together about the strengths and weaknesses of your performance will make your group better. The Billion Dollar Day When Nothing Happened – “These Ads Suck." That was the note that Larry Page wrote and hung up about Google Ad Words. What did Jeff Dean, a quiet, skinny engineer from Minnesota, do to make the ads not suck? He had no immediate need to fix the problem. He worked in Search (a different area of the company. And how did Jeff Dean respond when he was asked about it years later (he said he didn't even really remember it. It was just normal to do stuff like that)... There is a misconception that great cultures are places that are always happy. Doing great work is hard. The way we build great cultures is by doing hard things together focused on connection and safety. Life/Career advice: Think of your life in experiments and the learning loop. It is Experience + Reflection. Experience + Reflection. WRITE DOWN WHAT you've learned from your experiences. Writing creates clarity of thought. Amy Edmondson researched Chelsea and Mountain Medical – What made them a success? The answer lay in patterns of real-time signals through which the team members were connected. There were 5 things: Framing - They conceptualized MICS as a learning experience that would benefit patients and the hospital. Unsuccessful teams viewed it as an add-on to existing practices. Roles - Role clarity. Being told explicitly by the team leader why their individual and collective skills were important for the team's success Rehearsal - Practice a lot Explicit encouragement to speak up Active reflection - Between surgeries, successful teams went over their performance