POPULARITY
In this episode of the Earley AI Podcast, host Seth Earley sits down with Yang Li, a leading figure in AI and software innovation. Yang is the Chief Operating Officer of Cosine, an advanced AI development firm, with deep experience driving startups, scaling organizations, and pioneering advancements in engineering and software development. Yang's work focuses on leveraging AI to empower the next generation of developers, especially in navigating the increasingly complex landscape of modern and legacy codebases.Yang and Seth dive into how AI is reshaping the role of software engineers, the evolving challenges of handling massive backlogs and legacy systems, and what creativity and efficiency really look like in an age of AI-powered software development.Key Takeaways:AI's Impact on Software Engineering: AI is shifting the developer's role from hands-on coding to more creative, iterative, and strategic work.Tackling Legacy Code: Cosine is pioneering new ways for AI to handle outdated and complex codebases (like COBOL and Fortran) that most engineers—and AI models—struggle with.Augmenting, Not Replacing, Engineers: AI tools like Cosine's Genie reduce ramp-up time for engineers, help address daunting backlogs, and act as creative partners rather than outright replacements.The Challenge of Benchmarks: Yang explains why public coding benchmarks can be misleading when bringing products to real-world enterprise environments, especially with diverse codebases.The Emergence of ‘Vibe Coding': Idea-to-prototype time is shrinking, allowing non-technical team members to quickly bring their ideas to life using AI assistants.Risks & Limits: Over-reliance on AI, standardization versus differentiation, and the need for new evaluation criteria in engineering organizations.Future Skills: The importance of risk-taking, adaptability, and prompt engineering as software development evolves, plus insights into how organizations are rethinking career ladders and promotions in an AI-powered world.Insightful Quote from Yang Li:"Previously you had to use words and language to describe your idea, you can now show people your idea... The time between you having thought of an idea to actually be able to show people that idea has now reduced almost to zero because of vibe coding."Tune in to discover what's next for software engineering in the age of AI, and how to stay ahead in this rapidly changing landscape.Thanks to our sponsors: VKTR Earley Information Science AI Powered Enterprise Book
More info: https://docs.anthropic.com/en/docs/claude-code/overview The AI coding wars have now split across four battlegrounds: 1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc). 2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months. 3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results) 4. The cli-based agents: after Aider's initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used. Since we covered all three of the first categories, today's guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it's this piece from Boris: Claude Code is not a product as much as it's a Unix utility. This fits very well with Anthropic's product principle: “do the simple thing first.” Whether it's the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy: Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day! If you're building AI developer tools, there's also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy! Timestamps [00:00:00] Intro [00:01:59] Origins of Claude Code [00:04:32] Anthropic's Product Philosophy [00:07:38] What should go into Claude Code? [00:09:26] Claude.md and Memory Simplification [00:10:07] Claude Code vs Aider [00:11:23] Parallel Workflows and Unix Utility Philosophy [00:12:51] Cost considerations and pricing model [00:14:51] Key Features Shipped Since Launch [00:16:28] Claude Code writes 80% of Claude Code [00:18:01] Custom Slash Commands and MCP Integration [00:21:08] Terminal UX and Technical Stack [00:27:11] Code Review and Semantic Linting [00:28:33] Non-Interactive Mode and Automation [00:36:09] Engineering Productivity Metrics [00:37:47] Balancing Feature Creation and Maintenance [00:41:59] Memory and the Future of Context [00:50:10] Sandboxing, Branching, and Agent Planning [01:01:43] Future roadmap [01:11:00] Why Anthropic Excels at Developer Tools
Al and Codey talk about Mini Mini Farm Timings 00:00:00: Theme Tune 00:00:30: Intro 00:01:33: What Have We Been Up To 00:11:26: I Know What You Released Last Month 00:13:57: Game News 00:28:37: New Games 00:41:43: Other News 00:56:24: Mini Mini Farm 01:33:42: Outro Links Piczel Cross: Rune Factory Release Date Space Sprouts Release Date Luma Island: Pirates Ranch of Rivershine “1.7” Update Horticular: Frozen Frontier Melobot: A Last Song OST Sky Harvest Pheonix Labs Layoffs ConcernedApe NPR Interview Reuters Cozy Gaming Interactive Article Contact Al on Mastodon: https://mastodon.scot/@TheScotBot Email Us: https://harvestseason.club/contact/ Transcript (0:00:30) Al: Hello farmers, and welcome to another episode of the harvest season. My name is Al, (0:00:36) Codey: And my name is Cody! (0:00:37) Al: and we are here today to talk about cottagecore games. (0:00:42) Codey: Oh woo! (0:00:44) Al: It’s like a pack of wolves. (0:00:50) Codey: I’m never gonna un-hear that now. (0:00:51) Al: We this episode, we are going to talk about many, many farm because apparently we’re doing two (0:00:59) Al: mobile games in a row because you did Animal Crossing last week. And then we’re doing many, (0:01:00) Codey: Mm-hmm. (0:01:04) Codey: Yeah. (0:01:05) Codey: Yeah. (0:01:06) Al: many farm this week. And yeah, I just realized that today. I was like, Oh, yeah, (0:01:08) Codey: Yes, I forgot about that. (0:01:13) Al: two in a row. Interesting. Oh, well, we’re making Cody work for their title of (0:01:19) Al: mobile correspondent. (0:01:20) Codey: Yep, I’m here for it. For sure. And I am still actively playing mobile games. (0:01:21) Al: Before that, well, yes, so before that we have news, I’m going to overview the (0:01:31) Al: January releases because it’s now February. But first of all, Cody, what have you been up to? (0:01:36) Codey: Um, so, uh, definitely been playing many, many farm. Um, because of the last episode. (0:01:42) Al: Oh yeah, that’s what MMF stands for. (0:01:44) Codey: Yeah. (0:01:44) Al: I was like, what’s MMF? (0:01:45) Al: Many, many fun, of course. (0:01:46) Codey: And many, many farm. (0:01:48) Codey: Um, because of Johnny, I am now cursed to be playing Animal Crossing pocket camp. (0:01:54) Al: A game which you hadn’t played for the podcast, you know, playing (0:01:59) Al: because of the podcast. (0:02:00) Codey: Correct. (0:02:01) Al: Oops. (0:02:02) Codey: Um, I had played it when it was like not the complete. (0:02:06) Codey: Like paid version. (0:02:06) Codey: Um, but because we were talking about it and I saw that it was like cheap and then it was possibly going to become less cheap. (0:02:14) Codey: And I’ve been doing really well with budgeting lately. (0:02:16) Codey: I was like, you know what? (0:02:18) Codey: I can, I can afford 10 bucks. (0:02:18) Codey: So, and I don’t, I don’t know. (0:02:20) Al: Yes. Do we have the actual date? I know that it’s very soon, or it’s, like, just in the (0:02:26) Al: past, but it wasn’t when the podcast episode came out. (0:02:30) Codey: And unfortunately I have now bought it. (0:02:32) Codey: So I have no way of checking because I’m pretty sure. (0:02:35) Al: Ah, it was the 31st of January. So, if you bought it when the last episode came out, (0:02:36) Codey: Okay. (0:02:38) Codey: So now it is what? (0:02:38) Codey: 20 bucks. (0:02:42) Al: or the two days after that, you were good. Otherwise, sorry, too late. And now it’s, yeah, $20. (0:02:48) Codey: So, I’ve been playing that. I’ve also been playing, still been playing Honeygrove, still (0:02:58) Codey: really sucked into Honeygrove. And I, you know, it’s so funny because we, whenever we would (0:03:01) Al: You’re just playing all the mobile games. (0:03:06) Codey: cover them before, it’s like, yay, I can uninstall it now. And the last couple ones, I’ve been (0:03:12) Codey: like, oh, no, I want to keep playing this. So, yeah, I do. (0:03:14) Al: Mm-hmm, oops. (0:03:19) Codey: But it’s nice because I’m, you know, nearing the end, the other thing, quote unquote, I’ve been (0:03:24) Codey: doing is is a PhD. And it is crunch time for sure now. So I pretty much like, I’m doing a lot of (0:03:34) Codey: stuff all the time. If I’m not doing specimens, I’m writing if I’m not doing that, either of those (0:03:40) Codey: two things I’m like, I’m always doing something. So this, this gives me a nice little like, okay, (0:03:46) Codey: I’m gonna sit down for like a half an hour and just like (0:03:48) Al: Mm, are you rotating through them or? (0:03:48) Codey: brain off play these silly little games. Yeah, so every (0:03:56) Codey: well, I guess I’m also playing too many games. I’m also playing (0:04:02) Codey: Pokemon TCG pocket and the new thing just released. And so I (0:04:08) Codey: always check that first. Let me look at my guess. I always (0:04:11) Codey: check that first. And then I do honeygrove because I can like (0:04:14) Codey: send everything off, like my little bees off on their (0:04:17) Codey: or expeditions. (0:04:18) Codey: And then I do pocket camp and then I do mini, mini farm for a little bit. (0:04:23) Codey: And then if for whatever reason, I am bored after that, um, or not (0:04:28) Codey: sucked into mini, mini farm, uh, I have my cross-stitch coloring out. (0:04:30) Al: Hmm (0:04:33) Codey: But yeah, that’s, that’s my, my, my brain off time now. (0:04:33) Al: Fair enough (0:04:40) Codey: Was it? (0:04:40) Al: Nice, I have been playing a lot of Pokemon. (0:04:45) Al: So I think last time we talked, Cody, (0:04:47) Al: I was just nearly finished, (0:04:47) Codey: Mm-hmm. (0:04:50) Al: Brilliant Diamond and Shining Pearl. (0:04:51) Codey: Mm-hmm. (0:04:53) Al: I have now finished that, thank goodness. (0:04:54) Codey: Okay. (0:04:55) Codey: Yeah, you’re free. (0:04:57) Al: So that one’s done. (0:04:59) Al: And I was going to kind of maybe stop there, (0:05:02) Al: but then I was like, no, I need to do Let’s Go as well. (0:05:05) Al: So I did the Let’s Go Pokedex, (0:05:08) Al: and I rushed through. (0:05:10) Al: Uh, another save because I hadn’t recreated my Pokemon. (0:05:15) Al: Let’s go Pikachu save. (0:05:18) Al: So I did that. (0:05:18) Al: So that now is done. (0:05:19) Al: So all of my home DEXs are done except sword and shield. (0:05:25) Al: So I’ve got the let’s go one. (0:05:27) Al: I’ve got brilliant diamond, shining pearl. (0:05:29) Al: I’ve got let’s go. (0:05:30) Al: Arceus and I’ve got all this scarlet and violet ones. (0:05:32) Al: They’re all done. (0:05:33) Al: I haven’t done the sword and shield ones. (0:05:35) Al: Um, and I now have a post game. (0:05:40) Al: Save of every Pokemon Switch game, except shield. (0:05:45) Al: So I’ve recreated all my saves, except that. (0:05:46) Codey: Okay. Wow. (0:05:49) Al: And I have done a professor Oak challenge now of every pair of games, except can you (0:05:55) Al: guess? (0:05:56) Al: No, no, there’s certain shield. (0:05:56) Codey: Brilliant. I’m in shining girl. Oh, I don’t know. Okay. Okay. (0:06:03) Al: So, so at some point, I would like to do a professor Oak challenge in shield. (0:06:10) Al: Uh, and that does all three of those things. (0:06:12) Al: It does a professor Oak challenge in, in that series of games. (0:06:16) Al: It basically completes my home pocket X, right? (0:06:19) Al: Cause you’re catching everything anyway. (0:06:22) Al: Um, and it, it then me, it will mean I have a shield save in post game as well. (0:06:27) Al: But I don’t think I’m going to do that now because I’m worried I might burn out on Pokemon. (0:06:31) Codey: Mm-hmm. (0:06:35) Al: And we don’t know yet when the new games coming out. (0:06:38) Al: I don’t expect it to come out until- (0:06:40) Al: november but we don’t know for certain and I don’t want them to be like come out and like me (0:06:46) Al: spend the next month doing this and then they come out and then I burn out and I’m like okay (0:06:50) Al: I’m done with pokemon for like six months and then they come out and say oh legends za is actually (0:06:57) Al: coming out in April and I’m like oh no that is really soon so uh but I’m also worried that they (0:07:05) Al: might do the release of pokemon for the home decks very soon and yeah sure I don’t- (0:07:10) Al: need to do it as soon as it’s done of course I don’t need to but I will feel the drive to (0:07:15) Al: do it at that point so I’m like do I actually just do the home decks just now and then leave (0:07:21) Al: the professor oak challenge for another time but then why why not just do the professor oak challenge (0:07:28) Al: but then I’m also the reason I was playing those games in January was because there weren’t any (0:07:32) Al: games coming out that I was planning on playing for the podcast and now we have a billion of them (0:07:35) Codey: Yeah, but it would be really inefficient to not just do it, do them together. (0:07:38) Al: them coming out in February and March. (0:07:40) Al: I’m probably going to just wait and do it all at the same time, probably next January. (0:07:54) Codey: yep. yep. (0:07:57) Al: Because January does tend to be quite a quiet period, but I guess that depends on when ZA (0:08:04) Al: comes out. (0:08:05) Al: Because if ZA comes out in November, I’m probably not going to want to do a Professor Oak challenge (0:08:08) Al: of SHIELD. (0:08:10) Al: In January, so maybe, maybe I’ll just wait till Pokemon Day and they’ll all they will almost definitely tell us the release date then right like there’s no way they’re not going to do that. (0:08:22) Al: That reminds me, we want to do Pokemon Day predictions. (0:08:26) Codey: Oh, okay, okay. (0:08:28) Al: And Pokemon Day reactions greenhouse episode that gets us to this month. (0:08:29) Codey: Mm-hmm. (0:08:33) Codey: Mm-hmm. (0:08:39) Al: so (0:08:40) Al: all that to say I have played a lot of pokemon in the last month and a half a lot a lot a lot (0:08:48) Al: of pokemon and I think like maybe like 120 hours over the last month and a half we’ve just pokemon (0:08:56) Codey: Mm hmm. I’m shocked. (0:08:58) Al: still not found a brilliant diamond shining pearl (0:09:02) Al: still a big big fan of let’s go great games love them second best pokemon game (0:09:11) Codey: Yeah, I really have been wanting to go back and replay. I have Eevee, but I also have (0:09:17) Al: Mm-hmm. Yes, fair, fair, fair. So yeah, I’m probably Pokémon’d out for now, but we’ll see. (0:09:21) Codey: else going on. So, yep. (0:09:30) Al: I’ve also been keeping up with Harvestmen, Home Sweet Home. Look at me actually playing a farming (0:09:36) Al: game a little bit a day. What a crazy idea. I know. So I’m now in chapter five, enjoying that. (0:09:37) Codey: Wow not guzzling (0:09:40) Codey: - I’m done. (0:09:44) Al: I don’t know. (0:09:45) Al: I don’t know what to do with that. (0:09:47) Al: Yeah, actually, yeah, no, I will. (0:09:49) Al: I’m enjoying it. (0:09:51) Al: I’m enjoying playing this game. (0:09:52) Al: This is a fun game. (0:09:53) Codey: Mm-hmm. (0:09:54) Al: It still has issues. (0:09:56) Al: Absolutely. (0:09:58) Al: The Cloud Save still not working for me two months later. (0:10:02) Codey: I’m shocked. (0:10:04) Al: But controller support has made it playable. (0:10:07) Al: And it’s actually fun. (0:10:10) Al: It’s no Stardew Valley. (0:10:11) Al: I don’t care about the characters as much. (0:10:14) Al: But there’s a lot to like about it. (0:10:16) Al: and I’m hopeful for… (0:10:17) Al: and I have also started playing Hello Kitty Island Adventure, which I did play when it came to Apple Arcade, so I played it for about a month and enjoyed it. (0:10:33) Al: But then I was like, I am so fed up with playing on the touchscreen and let me tell you, playing with the controller infinitely better. So good! (0:10:37) Codey: Yeah. Okay. (0:10:41) Al: So I’ve been playing on my Steam Deck, I know that I know some people who are playing on… (0:10:47) Al: which may or may not come up in a future episode, but yeah, no, it’s a good game. It is way better than it had any right to be. (0:10:50) Codey: - Ooh. (0:11:02) Al: I completely forgot until I started the game how the game starts and I’m like, I just love… it’s so ridiculous that it basically starts with a plane crash. (0:11:11) Al: Because who would expect that in a Hello Kitty game? Alright, I think that’s everything. That’s what we’ve been… (0:11:17) Al: We are going to continue the new segment, the month’s releases and the previous month’s releases. We’re going to talk about last month’s releases. (0:11:18) Codey: Oh, woo. Oww, ow, ow, ow. (0:11:31) Codey: Last month’s what what was released last month? I got you (0:11:36) Al: Listeners, write in and tell us what this segment should be called, this monthly segment. What released last month? (0:11:44) Al: January 2025 edition. Or should it be fae- (0:11:47) Al: Maybe the 2025 edition, because last month would be January, but it’s like what would (0:11:51) Al: have- what released last month? (0:11:53) Codey: What are you what do you call it so like in? (0:11:57) Codey: Start in stardivale correlate whatever when you go to sleep, and there’s like that recap screen (0:12:02) Al: This summary… (0:12:03) Codey: That’s all you call it. That’s all it’s called (0:12:07) Al: He gla-la… (0:12:07) Codey: Like the daily summary listeners. Let us know (0:12:10) Al: I don’t know… (0:12:11) Codey: What do you call that screen like when it tells you what you did for the day how much all your stuff sold? (0:12:16) Codey: I think that whatever that is called is what this segment should be called because it’s like we just fell asleep on January (0:12:23) Codey: And we’re waking up and it’s February, but let’s like think about the things that occurred last last month (0:12:30) Al: I’m really struggling to Google this. (0:12:31) Codey: Don’t work yeah, don’t worry about looking it up. They got a listeners have to tell us (0:12:35) Al: Okay, so January, what released in January? (0:12:41) Al: We have four releases in January 2025. (0:12:44) Al: We got Harvest Hills, releasing mid-January the 15th. (0:12:48) Al: We got Into the Emberlands, Not Wonderful, released on the 20th of January when we had (0:12:54) Al: Hello Kitty Island Adventure on Steam and Switch that released on the 30th of January. (0:12:58) Al: my little life which is our first (0:13:01) Al: game of Rusty’s Like or as a developer of Rusty’s retirement is calling them bottom of the screen game (0:13:06) Codey: boss game which he he like talked it up on the um on blue sky he like was like y’all (0:13:06) Al: and that released on the 31st of January. So a kind of (0:13:17) Codey: should get this game and now i’m looking at it oh it’s only five dollars and 39 cents (0:13:24) Al: Yeah, it is dangerous. (0:13:25) Codey: oh no it’s only it’s only windows oh I almost clicked it y’all I almost we good okay i’m (0:13:34) Codey: I wish. (0:13:36) Codey: Let it be not just Windows, my little life developer. (0:13:39) Codey: I want to play this game. (0:13:40) Codey: Thank you. (0:13:42) Al: Yeah, so that’s the January releases, wild that I’m about to say this, but that’s a (0:13:47) Al: quiet month. Four games is a quiet month, apparently. (0:13:51) Codey: Yeah, not a lot going on. (0:13:57) Al: Okay, so we’ve now got a bunch of news. We’re going to start with the gaming news. So first (0:14:04) Al: up we have Pixel Cross Renfractory, they have announced a release date for this. So this is (0:14:09) Al: like the pixel cross Stodio seasons. (0:14:12) Al: It’s a Picross type game, but not Picross because Nintendo on the trademark to that kind of. (0:14:21) Al: Yeah, you do your nonograms. I think that’s what the generic term people have been using, (0:14:27) Al: nonograms. You do your nonograms and in the story seasons one, it like built up a farm (0:14:34) Al: in the background as you do it. Yeah, I don’t know. I don’t know if they’ve shown (0:14:36) Codey: is that is that what it is because it also says customize like in this trailer they say customize (0:14:40) Al: and I’ll see you next time. (0:14:42) Al: Yeah. (0:14:43) Codey: your farm and I don’t and it like looks like you choose what like where things are placed and so (0:14:50) Codey: that was one that was my only question was like I mean it’s coming some when it comes out folks (0:14:55) Codey: can tell me unless people have been playing I don’t know if there’s a dummy (0:14:57) Al: I don’t, I, yeah, oh, interesting. (0:15:01) Al: So it does look like you can change things in this one. (0:15:03) Al: So I’m pretty sure on the story of seasons one, (0:15:06) Al: you just saw the farm build up and things grow (0:15:09) Al: and you didn’t have any control over how it looked, (0:15:12) Al: but you’re right, it does. (0:15:13) Al: So it says customize your farm (0:15:14) Al: and it shows different animals or monsters. (0:15:17) Al: And then it shows you actually selecting (0:15:19) Al: what weapon you want your character to have, (0:15:21) Al: including a massive lollipop as an option. (0:15:25) Al: So yeah, it looks like it’s more in depth. (0:15:28) Codey: Well, I’m wondering if it’s just like you can control what it looks like in the background (0:15:34) Codey: while you are, yeah. (0:15:34) Al: I think that, yeah, I think that’s all it is. (0:15:36) Al: I don’t think you’re actually doing any farming (0:15:39) Al: or any battling, that just happens in the background (0:15:41) Al: as you’re doing it, but in the story seasons one, (0:15:44) Al: I’m pretty sure you couldn’t change how it looked. (0:15:46) Codey: is it like learn how to do a carrot by doing a carrot learn how to plant (0:15:52) Codey: carrots I like okay (0:15:52) Al: It wasn’t even that much in the study seasons when it was literally, you do stuff and things grow in the background you weren’t really. (0:16:00) Al: Yeah, there was nothing else. (0:16:02) Codey: Sounds good. Some people are probably jumping for joy that (0:16:06) Codey: there’s a new across game coming out. New an Autogram coming (0:16:10) Al: nonagram yeah yes yeah anyway space sprouts have announced that they’re (0:16:13) Codey: it’s like Kleenex. It’s like people say get a Kleenex but (0:16:17) Codey: that’s a brand. (0:16:21) Al: releasing on the 31st of March (0:16:24) Codey: Mm-hmm. They’re also participating in Steam’s next fest, so… (0:16:28) Al: Cody who isn’t (0:16:30) Codey: Oh, okay, fine. They’re also moving on to the next part. Uh, I guess the only thing… (0:16:33) Al: I mean I’d like I just I find the steam next fest stuff so funny because it’s (0:16:40) Al: it doesn’t really mean anything it’s like it’s it’s like it’s like being part of (0:16:45) Al: a sale right you can still do a sale whenever you want you can put your price (0:16:48) Al: down whatever but it’s like if you do it at a specific time you might get on a (0:16:50) Codey: Right. But that’s the thing, like there’s a specific list that they’ll get clicked on. (0:16:52) Al: list (0:16:55) Al: but the list is too long (0:16:58) Codey: Okay, but like not every it is still selective, right? Like not hashtag not everyone gets on the (0:17:02) Al: no I don’t I don’t think so I think anyone even get in the list (0:17:03) Codey: list. Well, anyways, that’s from February 27 to March 3. But they are also looking for playtesters. (0:17:12) Codey: So if you go to the show notes, go to the Steam page, etc, etc, you can figure out how to become (0:17:18) Codey: on my playtester for space sprout. (0:17:20) Al: Whoo! Yeah, what was this game again? I can’t… Oh yeah, it was like the 2D space. It was like, (0:17:28) Al: yeah, I’m not describing that very well. But yeah, it was a 2D world where you’re floating around, (0:17:30) Codey: Mm-hmm (0:17:33) Codey: 2D floating in space. Yeah (0:17:36) Al: yeah. Yeah, I’m very interested in this one. I wonder how it’s going to feel playing (0:17:42) Al: farming in 0G. Although it does look like some of it has gravity, and some of it has gravity, (0:17:44) Codey: Mm-hmm. I mean it’s interesting. (0:17:50) Codey: Well, if I learned anything from the Martian, farming in 0G contains the recycling of human waste. (0:17:58) Codey: So, very excited for that. (0:18:00) Al: for sure next we have a free update coming to luma island and this is called pirates (0:18:12) Codey: With an excolate your boy, pirates! (0:18:15) Al: just pirates um it’s literally called luma island pirates what uh although I don’t think (0:18:22) Al: the exclamation mark is actually part of the title because later down they say what’s coming (0:18:26) Al: in Luma Island Pirates without the exclamation mark. (0:18:29) Codey: Uh, I choose to believe, yes. (0:18:30) Al: Don’t you always? (0:18:32) Al: So this brings a pirate themed zone with new minigames, a new temple, traps and enemies, a new profession. (0:18:44) Al: Johnny and Dallin I think, they both play it. They’ll be excited about a new profession. (0:18:51) Al: A full screen map, that’s definitely something it needed. I was annoyed about not having the full screen map. (0:18:56) Al: three new game modes, (0:18:58) Al: including (0:19:00) Al: hero mode and cozy mode. (0:19:02) Al: I wonder what the third mode is. (0:19:04) Al: I love how they’d say three modes (0:19:04) Codey: I’m curious what the new profession is. (0:19:06) Al: and they mentioned two of them. (0:19:10) Codey: Is it piracy? (0:19:12) Al: Oh, interesting. Yeah, that’s a good point. (0:19:14) Al: It could be. (0:19:14) Codey: Like goats? (0:19:16) Al: Even if not actually, (0:19:18) Al: piracy definitely could be related to that. (0:19:20) Codey: Or like treasure hunting? (0:19:20) Al: Yeah, that is a good point. (0:19:22) Al: Yeah, well they do have a treasure hunter one already, I’m pretty sure. (0:19:26) Codey: Okay, so it is piracy. (0:19:28) Codey: Destroy this city and loot the people. (0:19:30) Al: Maybe. You never know. You never know. Also, new outfits, quests, NPCs, Lumas, powers, (0:19:34) Codey: Mutiny, mutiny your own. (0:19:38) Codey: I’m very curious. (0:19:44) Al: bonuses, and achievements. Yes, yeah, that is a free update. That is not a DLC. That (0:19:46) Codey: Ooh, that’s a lot of content in a free update. (0:19:52) Codey: Yeah. (0:19:53) Al: is a free update. Coming soon. No date yet. Coming soon. (0:19:58) Al: Speaking of updates. (0:20:00) Al: RiverShine have announced their 1.7 update. (0:20:05) Al: This is called Azure Coast Trail. (0:20:08) Codey: what? No, say it, say it how you say it again. Oh, that’s so cool. We just say Azure. I like (0:20:08) Al: Azure. (0:20:09) Al: Azure. (0:20:10) Al: How would you say it? (0:20:13) Al: Azure. (0:20:14) Al: Oh, no, Azure. (0:20:18) Codey: your way of saying it. Continue. (0:20:20) Codey: you. (0:20:21) Codey: You’re welcome. (0:20:21) Codey: You’re welcome. (0:20:21) Al: Thank you. (0:20:25) Al: This brings new competitions, horses, music, accessories, and loading screens. (0:20:30) Codey: Oh, whoo, the loading screens look really good, like the the art. (0:20:35) Codey: I mean, I’m betting that they have like a ton of humans that love this game (0:20:39) Codey: and just are like, take here, take my art. (0:20:41) Codey: The loading screens look really cool. (0:20:43) Codey: And the new horse is like a cool new wild horse species. (0:20:48) Codey: I almost look I. (0:20:48) Al: Rabi Rabbi Kano, Rabbi Rabbi Rabbi Kano, I think Rabbi Kano. (0:20:55) Codey: Let me it’s probably something like Robicano or Robic Robicano or something. (0:21:00) Codey: Um, yeah, I didn’t look up to see if those are actually like a thing. (0:21:06) Codey: I almost did and then I didn’t. (0:21:07) Codey: Oh, yep. (0:21:08) Codey: They’re a type of Arabian horse. (0:21:10) Al: rare horse coat color pattern that features white. (0:21:15) Codey: Oh, so it’s just a whatever. (0:21:17) Codey: It’s they’re really cute. (0:21:21) Codey: Yeah, and they also announced that the next update is going to introduce (0:21:27) Codey: a new character and that is the veterinarian. (0:21:30) Codey: Also introduce, you know, care for your horse. (0:21:34) Codey: So different, you know, that care like they might maybe they get sick. (0:21:39) Codey: Maybe they have certain nutritional needs and you didn’t need to make sure (0:21:43) Codey: you meet them. (0:21:44) Codey: I’m not entirely this is all just me. (0:21:46) Codey: Just I don’t know. (0:21:47) Codey: Just like trying to like think of what it could be, but that’s cool. (0:21:52) Codey: I’m all for it. (0:21:53) Codey: Can I be the veterinarian? (0:21:58) Codey: Aww. (0:22:01) Codey: We need a game where you’re like the veterinarian. (0:22:03) Codey: We don’t have that. (0:22:04) Al: Go make it. (0:22:05) Codey: No, I’m good. (0:22:06) Codey: Someone should make it though. (0:22:08) Codey: Or like a wildlife biologist. (0:22:08) Al: Let us know. (0:22:11) Codey: I don’t know. (0:22:12) Al: Is that not just research story? (0:22:14) Codey: Go play research. (0:22:16) Al: I mean, tell me if I’m wrong, you’re the one that’s played it. (0:22:17) Codey: No, I’m trying to. (0:22:18) Codey: Yeah, no, I’m trying to think of like, no, (0:22:20) Codey: like a game where you’re a rehabber. (0:22:22) Codey: Where you rehabilitate wild animals that people bring to you. (0:22:25) Codey: I think the only issue with that is that it’s sad because they die. (0:22:28) Codey: die, but hey. (0:22:30) Codey: There was a Bluey episode about a bird dying, so it’s okay these days. (0:22:36) Al: Blue can do anything. (0:22:38) Codey: Bluey did it. That means it’s child approved. (0:22:42) Al: Let me tell you, right, me and Craig watch Blue together, (0:22:46) Al: and he’ll be sitting and laughing at the jokes and watching it and stuff, (0:22:48) Al: and then I’ll just be sitting behind him, just sobbing. (0:22:50) Codey: stopping. Yeah. Yeah, I just just finished it. And it I am (0:22:52) Al: Like, “Oh, no, what is happening? What’s the doing to me?” (0:23:00) Codey: upset. And I need more. I watched all of it. Thanks. I’ve, (0:23:04) Al: Nice. Well done. (0:23:07) Codey: I just I crave distraction in the background while I run (0:23:11) Codey: meaningless analysis. They’re not meaningless analysis. They’re (0:23:14) Codey: just tedious analysis correction. But yeah, cool that (0:23:21) Codey: give me a game mode where I can play as the veterinarian and I (0:23:24) Codey: will play this game. Developers if you’re like, man, what do (0:23:26) Al: I mean, I feel like that’s just a whole different game, not just a different game mode, but… (0:23:31) Codey: people want these days? I bet a vet mode like a vet game would (0:23:37) Codey: crush. Yep. And I would pay probably $30 for it. So if it (0:23:39) Al: There’s at least one person who would buy it, that’s for sure. (0:23:45) Codey: takes more than $30 to make. I’m out. (0:23:48) Al: I’m not even promising there’d be two people because I’m not sure who the second person (0:23:50) Codey: Listeners. Let me know. Can you contribute $30 we can offer $60 (0:23:53) Al: would do it for the podcast. (0:23:57) Al: I’m sure there are I mean, look, if you could make a game for $60 you’d be rolling in it. (0:24:02) Codey: to developers. There’d be a lot of really bad games. Yeah. (0:24:14) Al: For sure for you. (0:24:15) Al: Well, yeah, you probably can make a game for $60. (0:24:18) Al: Absolutely dreadful. (0:24:21) Al: Just a Skinner box. (0:24:22) Al: All right. (0:24:24) Al: Next we have Particular have announced a free update and a paid DLC. (0:24:33) Al: They’re both releasing on the same day, 28th of February, and the paid DLC Frozen Frontier (0:24:39) Al: has a new story, world quests, new items and creatures. (0:24:45) Al: written snowshoe hair. Is that a creature? (0:24:47) Codey: Yeah, yeah, they specifically say snowshoe hair well, that is just one that they blurbed (0:24:48) Al: Is that a creature that’s there? Just one creature. (0:24:55) Al: blurb. No, I know what you mean. That’s a great example of verification. (0:24:55) Codey: It could be (0:24:59) Codey: Where did that word come from it is keep going I’m gonna look up what that where that came from (0:25:05) Codey: - Um. (0:25:07) Al: And the free update includes new creatures, some temp mechanics. What do you mean by that? (0:25:14) Codey: - Temperature, sorry. (0:25:15) Al: Oh, temperature was like temporary mechanics. Yes, temporary temperature mechanics, snow, (0:25:16) Codey: Now, (0:25:22) Al: And then obviously, quality of life improvements. (0:25:26) Codey: Yeah, so they both yeah, they both kind of include like, adding snow as a as a thing that you can see in the game. But one just adds like a whole new world. Also, I wanted to note that they say on in the beginning of this show notes, whatever, what is this called, like a, thank you. (0:25:48) Al: release notes or well it’s not really release notes because it’s not released (0:25:51) Codey: It’s a (0:25:51) Al: teaser (0:25:54) Codey: Teaser, they say… (0:25:56) Codey: “As spring arrives in the northern hemisphere, we’re not quite done with the cold weather. We got you southern hemisphere folks.” (0:26:02) Codey: Correction. We are also not ready for spring. (0:26:08) Codey: The United States weather predicting rodent has proclaimed that there are six more weeks of winter. (0:26:16) Codey: So, yeah, we’re not ready. (0:26:18) Al: Do I need to tap the sign? Seasons aren’t universal, Cody. (0:26:20) Codey: What’s the sign? (0:26:26) Codey: They specifically say “As spring arrives in the northern hemisphere.” (0:26:28) Al: Seasons aren’t universal in the northern hemisphere, Cody. (0:26:32) Codey: There’s six more weeks of winter. I don’t know what to tell you. (0:26:35) Al: Look, okay, so not every country has the same definitions of seasons. Not every country even (0:26:41) Al: has four seasons, and certainly not every country is going to listen to America when they say that (0:26:46) Al: that a rodent has decided it’s- (0:26:49) Codey: Okay, there’s like certain things that they should listen to us on and the majority of (0:26:54) Codey: things that other countries should just ignore Americans on, especially these days. (0:27:00) Codey: But one thing y’all should really listen to is our, our groundhog, Punxsutawney Phil, (0:27:07) Codey: who is an immortal groundhog that has bespake unto the cultists or whomstever and told them (0:27:18) Codey: in Groundhog E’s! (0:27:20) Codey: There will be six more weeks of winter and a bunch of people just went. (0:27:22) Al: Yeah, I have watched Groundhog Day. I do know the idea behind it. (0:27:27) Al: Finally, in the game news in bit, we have MeloBot, a last song, have released their (0:27:34) Al: original soundtrack on Steam. It is $12.99 on its own, or it’s also included in the Deluxe (0:27:42) Al: Edition for which is more expensive. It’s actually a really good deal if you get the (0:27:46) Codey: Oh, whoo. Yeah. (0:27:49) Al: deluxe edition, though, right? Because it’s like… (0:27:52) Al: 20 quid for the game. I’m back into pounds here because you confused me with your whole (0:27:57) Al: dollars. 20 quid for the game or is it 25 quid for the 25 dollars for the game? (0:27:58) Codey: - Yeah, sorry, I put dollars, I put US dollars. (0:28:00) Codey: $13. (0:28:04) Codey: $25 for the deluxe edition and then $13 if, (0:28:08) Codey: for just the soundtrack. (0:28:10) Al: Yeah. Well, how much is the base game is that is that $20 then? (0:28:13) Codey: - Man, I didn’t look at that. (0:28:14) Codey: Let me look. (0:28:16) Al: See, it is 20 quid or 10 and 10 quid for 20 or 27 quid. (0:28:22) Codey: Oh I had it wrong! The game is 25. The bundle that includes the digital deluxe upgrade is 35. (0:28:22) Al: Ah. (0:28:32) Al: OK, so it’s still a good deal, but it’s not as good a deal. (0:28:35) Codey: You save 8%. (0:28:37) Al: All righty, so that is the game news. (0:28:41) Al: We also have two new games announced. (0:28:43) Al: Well, kind of. (0:28:44) Al: One of them is a new game. (0:28:44) Codey: - In quotes. (0:28:46) Al: One of them is actually two that are not– (0:28:50) Al: should we talk about the one that’s actually new first? (0:28:52) Al: So that is Sky Harvest. (0:28:52) Codey: - Yes. (0:28:57) Al: The blurb for this one is, “Armed with hand-me-down tools (0:29:01) Al: and some cash. (0:29:03) Al: You begin your new life as the chief farmer, a position your (0:29:07) Al: grandfather once excelled in. Can you honor his legacy and (0:29:10) Al: transform the overgrown, untamed and desolate floating island (0:29:14) Al: into a flourishing farm abundant with produce? (0:29:19) Codey: produce is weird in that trailer was was grandfather sleeping (0:29:26) Al: I didn’t actually watch the trailer give me two seconds. No, he did (0:29:33) Al: Again, yeah, no he did (0:29:36) Codey: so as a child you come upon your grand your beloved grandfather deceased at the (0:29:44) Al: Dead, at the kitchen table, reading his hopes and dreams. (0:29:47) Codey: kitchen table with (0:29:49) Codey: a book in front of him. (0:29:53) Codey: The book says, if I wish I could have gone back one last time and it’s got like a ticket and then it shows you taking that ticket and going and honoring his legacy. (0:30:04) Al: This is how this is how I know that he’s dead because if he’s not dead that is horrific. You’ve just stolen his ticket (0:30:05) Codey: But like, (0:30:12) Al: The one thing he wanted to do you’ve stolen his ticket and gone without him (0:30:16) Al: Let waited a while because you’ve grown a beard now. You’re an adult now (0:30:16) Codey: also, (0:30:20) Codey: Yeah, there’s a whole beard, a mustache, wild. (0:30:21) Al: Goodness me. That’s dreadful (0:30:23) Al: You (0:30:24) Codey: But like, if I came upon my grandmother deceased, (0:30:29) Codey: I say this because my grandfather is already deceased. (0:30:34) Codey: If I came upon my grandmother deceased, and (0:30:37) Codey: I don’t care what’s in front of her, I’m not looking at that. (0:30:42) Al: Yeah, he just like rests his head on his grandfather’s dead arm and sheds one singular tear before stealing his ticket, his boat ticket. (0:30:45) Codey: And then… (laughs) (0:30:50) Codey: I don’t know how beloved that grandfather was, if that’s your reaction, my guy. (0:30:55) Codey: Anyway, this is me just… (laughs) (0:30:56) Al: One, he’s one tear’s worth a little bit. (0:31:00) Codey: Alright, this looks cool though. So you’re on floating islands, you’re flying around with a jetpack, you can manage a restaurant. (0:31:07) Codey: It just says manage a restaurant, but it just shows you telling the person what the one meal that you guys are making in the day is. (0:31:20) Codey: Not what restaurants do. (0:31:23) Codey: And it’s a really bad restaurant. (0:31:24) Al: It’s what really bad restaurants. (0:31:28) Codey: And then it also says make friends, and then there’s a dog with a scroll in its mouth, so I’m guessing you befriend a dog. (0:31:36) Codey: And unfortunately that wasn’t in the trailer, it was in this thing. (0:31:40) Al: I mean the trailer didn’t show you much, lesbian. (0:31:43) Codey: Right, the trailer was very teaser-y, but underneath that, on the post, they have… (0:31:50) Codey: I watched that video where there’s a dog 10 times to see… I wanted to see more of the dog. (0:31:59) Codey: What kind of dog is it? All that. (0:32:03) Codey: It’s definitely a tricolor something, but other than that, no. (0:32:08) Al: So I will say I’m not particularly enamored by the graphics in this game. (0:32:17) Al: Not that it looks bad, it’s very definitely trying to look how it looks, I think. (0:32:23) Al: What I find a bit weird is the graphics of the game and the graphics of the heads-up display, (0:32:30) Al: like the menus and stuff, they feel like they’re from different games. (0:32:31) Codey: Mm hmm. Yeah, like they had two different people designing those, (0:32:38) Codey: and one understood the assignment and one didn’t. (0:32:38) Al: Yeah. Yeah, so it’s a little bit weird. Very, very. I do like the flying. The flying looks fun. (0:32:45) Codey: The character also looks lanky. This is a tall character. (0:32:53) Codey: Yeah. Mm hmm. Cosine. I don’t. (0:32:57) Al: Yeah, I don’t know what else to say. That looks interesting. I love how it calls it sky farming (0:33:01) Codey: It’s farming, but in the sky. Or are we? Are you farming the sky? (0:33:04) Al: when it’s just farming. (0:33:08) Al: In the sky? Okay. No, no, no, no, it’s just you’re on a sky island. Which I feel like this whole (0:33:12) Codey: Like is there part you’re like collecting the sky? (0:33:16) Codey: You don’t know that. What if they what if you collect the sky? (0:33:21) Al: let’s this game has sky islands was a fun idea five years ago and now half the games are doing (0:33:26) Al: it. Which is I guess the problem with game development, right? It was even before that (0:33:30) Codey: It’s the tears of the kingdom like. (0:33:35) Al: people were doing it. They didn’t. (0:33:38) Al: No, I know. Yeah, yeah. No, I get it. I feel like this could be possibly interesting. (0:33:46) Al: I’m not really sure what it’s… The flying is the thing that is most interesting to me, (0:33:51) Al: but other than that, I’m not really sure what it is that they’re doing that’s unique, (0:33:54) Al: which is always the problem with cottagecore games is why should I play you over Stardew? (0:34:01) Codey: I think that’s correct. I think like that’s the thing about this is it’s just to get your (0:34:06) Codey: attention and we will continue. It’s not like I saw this and I’m like, yep, not going to play (0:34:11) Codey: that because there’s not a lot here. I want to, I want to see more. They’re going to probably (0:34:15) Codey: release more. And so far they just say Q2 2025 in the trailer. (0:34:16) Al: Yep. Yeah, where did you see Q2? I just see 2025. Oh, in the trailer, okay. Because on (0:34:28) Al: Steam just says 2025. Okay, I will update my list then. I didn’t pay attention to the (0:34:31) Codey: Yeah, he didn’t watch the trailer. (0:34:38) Al: trailer, there’s a difference there. All right, we also have the brand new and exciting (0:34:39) Codey: Oh, my bad. I get that though. (0:34:47) Al: Harvest Moon, Skytree Village, and The Lost Valley are for some reason coming to Switch. (0:34:55) Al: The good thing about this is it is a bundle, so it’s like you’re not buying the game separately, (0:35:00) Al: which is good, because my word that would be not worth any sort of money. I’m not sure who (0:35:06) Al: wants these games. It’s like they went, “Oh, when we did…” Because they worked with… (0:35:09) Codey: Yeah, so you– (0:35:16) Al: Because the rights are complicated to the old Harvest Moon games, right? So they’ve done some, (0:35:22) Al: they released the original Harvest Moon on, what’s it called, Nintendo Switch Online, (0:35:31) Al: and they had to do that in collaboration with Marvelous, because Marvelous owned the game, (0:35:36) Al: but they owned the name, and so they had to both agree to that. Anyway, whatever, it doesn’t matter. (0:35:41) Al: And I feel like that combined with Marvelous redoing a wonderful… (0:35:46) Al: life has made them go, “Oh, people like when we remake Harvest Moon games and knock on which ones (0:35:55) Al: is it that people actually want to play, because I guarantee you it’s not Skytree Village in The (0:35:59) Al: Lost Valley.” (0:36:00) Codey: - Yeah, I will say, okay, so two things. (0:36:04) Codey: First of all, I looked, so one social media user, (0:36:08) Codey: to your question of who’s asking for this, (0:36:10) Codey: one social media user named Chrissy said, (0:36:13) Codey: “Cozy gamers have really been winning lately.” (0:36:16) Al: I wonder whether that person has ever actually played either of these. (0:36:16) Codey: To which another, (0:36:23) Codey: to which another user said, (0:36:25) Codey: “These games are more like a loss.” (0:36:28) Al: The funny thing is they did the whole, “Oh, we’re going to announce an announcement.” (0:36:34) Al: And they were like, “Oh, we’ve got an exciting announcement coming for you.” (0:36:34) Codey: Yeah (0:36:37) Al: And you’re like, “Okay, fine.” (0:36:40) Al: And then they did this and people were like, “Really? (0:36:43) Al: That was your… (0:36:44) Al: Please tell me this wasn’t everything.” (0:36:46) Al: Because it’s just, they’re like, I am not the sort of person who just hates on Harvest (0:36:50) Al: Moon, you know, Natsume, Harvest Moon games for the sake of it. (0:36:54) Al: You know, I am literally playing Harvestman, Home Sweet Home, as we’re recording. (0:36:58) Al: The podcast, right? And I’ve talked about how I like that. I’ve talked about how I like the ideas (0:37:02) Al: in One World and Winds of Anthos. I think they’re very interesting and I think that they’re very (0:37:07) Al: close to legitimately having a good game. These games are not that. These games are just bad. (0:37:10) Codey: Mm hmm. This ain’t it chief. Yeah, I will say so. I was listening to another podcast (0:37:22) Codey: about metal music lately and they were talking about I had there’s a point fault. Stay with (0:37:27) Al: I look forward to it. (0:37:28) Codey: me. They were talking about how this one band re like, is republishing like re thank you (0:37:37) Al: - Remastered. (0:37:38) Codey: remastering. (0:37:40) Codey: I think they’re actually just straight up rerecording an (0:37:42) Codey: entire album and like reproducing it. (0:37:42) Al: - Oh, okay. (0:37:44) Al: They’re Taylor-swifting it. (0:37:45) Codey: Basically, they are that’s the they literally made a joke about (0:37:49) Codey: that and they had the same question like what who’s asking (0:37:53) Codey: for this and a bunch of people on social media were like, (0:37:56) Codey: ah, this is thanks, but I’d rather have no music, etc, etc. (0:38:00) Codey: But they actually said they made a really good point, which (0:38:02) Codey: is if there are people who have not played these games or (0:38:07) Codey: listen to this music or whatever. (0:38:10) Codey: Kind of an introduction to that to that content for them (0:38:13) Codey: because there might be people who have heard of this Harvest (0:38:17) Codey: Moon thing, but they haven’t really played it yet or whatever (0:38:22) Codey: and then maybe they see this bundle and they’re like, oh (0:38:24) Codey: wow, there’s two of them in here. (0:38:26) Al: They’re first and last Harvest Moon games (0:38:27) Codey: And so it’s not. (0:38:30) Codey: Well, yeah, so that’s the thing. (0:38:31) Codey: So I mean that they were talking about an actually good album (0:38:35) Codey: versus– (0:38:35) Al: Yes, I think that’s that is the key difference here Cody (0:38:39) Al: I think like I am NOT against remakes. I think remakes can be really good (0:38:40) Codey: - Yeah. (0:38:43) Al: I think I’m doing a wonderful life last year was good because that is a very beloved game (0:38:43) Codey: Yeah. (0:38:47) Al: That is a lot of people’s first farming game (0:38:51) Al: And you just have to listen to Kevin for five minutes to know how much some people were waiting for that (0:38:55) Al: Nobody has that about (0:38:56) Codey: Oh, yeah. That’s fair. (0:38:56) Al: these games. (0:38:59) Codey: Yeah, I I’m trying to give them the benefit of the doubt, but, uh, yeah, (0:39:04) Codey: I think that the thing is that it’s not just if they’re not being remade (0:39:08) Codey: or or bundled in for the switch for the fans. (0:39:14) Al: No. This is the problem is there aren’t games that Natsume can nostalgia grab on. (0:39:15) Codey: It’s to try and get new people into the into the fandom. (0:39:26) Al: Because they’re all owned by Marvelous. They only own the name. And Marvelous aren’t going to do (0:39:33) Al: anything about it. I think it was very different when they did the original on Nintendo Switch (0:39:38) Al: Online because that is the actual original game. They’re just porting. (0:39:44) Al: It’s not even porting. It’s literally just an emulator. They’re just literally allowing (0:39:51) Al: the game to run on it. And that’s very different to remaking games. And there’s no way Marvelous (0:40:00) Al: have remade multiple. They remade Friends of Mineral Town, which was a fun one to do. They (0:40:05) Al: remade A Wonderful Life. I can’t remember if they’ve done any other remakes recently. (0:40:10) Codey: Yeah, I don’t know, because I’m not ever going to touch it, so. (0:40:14) Al: They want to jump on the bandwagon. They have to just release bad games again. (0:40:20) Al: This is the thing. So many people have this be in their bonnet about Natsume and they’re like, (0:40:25) Al: oh, they’re just jumping on the name and using it to sell bad games. And yeah, that’s kind of true. (0:40:31) Al: Or at least it was kind of true. I do think now they’re actually getting better and they’re (0:40:36) Al: actually trying to make good games. They’re getting there. But the problem is that releasing (0:40:42) Al: they’re bad games again. (0:40:44) Al: They’re not going to convince anybody that they’re doing anything other than money-grabbing. (0:40:44) Codey: Yeah, it’s like there’s someone at the company that remembers when all these games first (0:40:55) Codey: came out and like the hype the hype of it and they’re trying to like regain that the (0:41:01) Codey: glory days and the it’s sometimes you just got to let things go and like when Bluey and (0:41:08) Codey: Bingo had to get rid of a bunch of their stuffies. Yeah. (0:41:10) Al: Oh, we watched that the other day. That was a good episode. (0:41:17) Al: Yeah, it’s painful to watch what they’re doing, because it’s like one step forward, 73 steps (0:41:24) Al: back. Like, I just… Why do this? And I… Oh, goodness. Yes, right. (0:41:24) Codey: like the American government. So we have some other people one year forward 73 years backwards. (0:41:40) Al: So… Yeah, a section we don’t often have, because normally it’s just game updates and (0:41:41) Codey: We have some other news. Uh-huh. Oh. (0:41:47) Al: occasionally new games, we do have the other news section. So we have three pieces of other (0:41:53) Al: news to talk about. The first one is super… Let’s start off with the negative one, shall we? (0:41:54) Codey: You got to be more specific. Oh, oh, you’re right. You’re right. You’re right. I needed. (0:42:01) Al: There’s only one negative one. Okay. (0:42:04) Codey: I had to look through it again. Yeah. (0:42:10) Al: So Phoenix Labs, the developers of Fae Farm and Dauntless, and were creating other games (0:42:15) Al: until last year when they laid off almost everybody who was working on any game other (0:42:20) Al: than Fae Farm and Dauntless have now laid off almost everybody else. Huzzah! (0:42:26) Codey: - Yay. (0:42:27) Al: They’re like, “What’s the point in a game studio that makes games? (0:42:30) Al: We don’t want to make games. We don’t even want to continue making our existing games.” (0:42:34) Codey: Yeah, you don’t have yeah, but you know, they really said, the developer said, quote, It’s unfortunate, but necessary. (0:42:44) Codey: Yeah, so I did do a dive into this, more than just like, just the top of the of the article or whatever I start, I really got into reading this article and like kind of looking at some stuff because I was just like, what is going on here? (0:42:45) Al: Yeah, I guess the games aren’t failing then. (0:43:00) Al: - Were you rage reading? (0:43:02) Al: Were you rage reading? (0:43:04) Codey: I was so after basically, the developer, the Phoenix lab, whom’s ever the whole the whole Phoenix lab people. Correct. Thank you. You’re so good with the words today. So they, they were, they were acquired by a blockchain company called forte labs. (0:43:14) Al: the company. What can I say? I’m on a roll. Words is my whole thing. (0:43:30) Codey: And when they were acquired, they then laid off (0:43:34) Codey: as you already mentioned 160 people and quote the new owner (0:43:39) Codey: reportedly pressed developers to draft methods for integrating (0:43:44) Codey: blockchain technology in its games for the purpose of buying (0:43:49) Codey: and selling and trading in game goods, according to former (0:43:52) Codey: employees. So the crypto market has has joined games y’all. (0:43:58) Codey: Uh. (0:43:58) Al: are we back on NFTs? I thought we killed NFTs like four years ago, what are you doing? (0:43:59) Codey: Yeah. (0:44:04) Codey: NFTs and crypto, man, they’re here to stay, I guess. (0:44:08) Al: Well no, crypto isn’t dead, but come on, when was the last time you heard about NFTs? (0:44:14) Al: Especially in games, they’re so 2022. (0:44:18) Codey: I believe Ascentient Cheeto recently gave more NFTs. (0:44:24) Codey: Continuing on, apparently after releasing Dauntless, (0:44:28) Codey: they were “criticized by players for its new in-app monetization design,” (0:44:32) Codey: which was probably the blockchain, (0:44:34) Codey: but erasing previous progression with the new Awakening update. (0:44:38) Codey: So they had an update and it released, it erased all the previous progression. (0:44:38) Al: Oh no! No! What?! (0:44:42) Codey: The game still has an overwhelmingly negative number of reviews on Steam. (0:44:49) Al: I miss that happening. I wasn’t really aware of this very much. I was aware of it when (0:44:50) Codey: And the… (0:44:54) Al: it initially released because it was like, “Oh, it’s gonna kill Monster Hunter.” And (0:44:56) Codey: Yeah. (0:45:00) Codey: It did not. (0:45:02) Codey: It really had it was nowhere new (0:45:04) Codey: because when it first launched, there were probably about 3200 concurrent players like people playing at the same time online. (0:45:12) Codey: Nowadays, it’s only ever around about 150 people. (0:45:18) Codey: So yeah, not sure what they’re doing. (0:45:22) Codey: I would suggest well, I guess I would I would say that I would suggest them to back off the blockchain, but they are literally owned and acquired by a blockchain company. (0:45:32) Codey: So I don’t think that’s going to happen. (0:45:36) Codey: So I’m not not really sure what this means for Fae Farm. (0:45:41) Al: What I find really funny is like, so I think crypto is most often a scam. (0:45:48) Al: I do think there are some interesting applications for blockchain as a concept. (0:45:56) Al: NFTs is not it. (0:46:01) Al: It has never been it, even on their own. (0:46:04) Al: And then when people started putting them into games, I was like, I don’t even know why. (0:46:10) Codey: I mean, I feel like it’s to try and like have an introduction. (0:46:11) Al: Like, what is happening, and why would you do this? (0:46:18) Codey: It’s like when they put smoking in movies so that they would get more smokers, right? (0:46:23) Codey: It’s like a, it’s a, it’s a possible way to normalize something. (0:46:26) Al: It’s like the Transformers series for selling more Transformers. (0:46:29) Codey: Yeah. (0:46:31) Codey: Uh, which it’ll probably have a small, well, it would have a small bump if it wasn’t for (0:46:38) Codey: uh, cozy gamers. (0:46:40) Codey: Cause I don’t think cozy gamers are the people or, or monster hunters style players. (0:46:46) Codey: I, you really gotta go for like the call of duty people. (0:46:48) Codey: I feel like they, they would do NFTs because they basically, that’s basically all they (0:46:53) Codey: do with their, uh, skins and stuff on all the, all the guns and whatever. (0:47:00) Codey: So like, it’s not monster hunter people, uh, with dauntless and then cozy games with fave (0:47:08) Codey: farm. (0:47:10) Codey: It’
Send us a textDiscover the secret sauce behind optimizing your content for search engines and master the art of aligning with search intent. Unravel the mystery of cosine similarity, a powerful mathematical tool, and learn how it transforms text into vectors to measure the relationship between your content and user queries. Join me, Ed Dawson, as I guide you through a library analogy to shed light on how this technique helps search engines like Google determine relevance, giving you the edge to boost your search rankings. With a focus on vector space models, you'll gain insights into how search engines evaluate web pages and how you can leverage this knowledge to enhance your SEO strategies.Transform your content creation approach by leveraging cosine similarity. Explore practical strategies that help refine keyword relevance and align with user intent, ensuring your content hits the mark every time. From content gap analysis to strategic clustering, learn how to identify missing topics, reduce topic cannibalization, and optimize site architecture to maximize unique page value. Embrace the power of semantic richness and regular audits to maintain high-quality content that avoids the pitfalls of keyword stuffing and duplicate content. Whether you're looking to enrich the value of a smoothie recipe or improve overall SEO performance, these actionable insights will empower your content-query alignment like never before.SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.comYou can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tipsTo get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me nowSee Edd's personal site at edddawson.comAsk me a question and get on the show Click here to record a questionFind Edd on Linkedin, Bluesky & TwitterFind KeywordsPeopleUse on Twitter @kwds_ppl_use"Werq" Kevin MacLeod (incompetech.com)Licensed under Creative Commons: By Attribution 4.0 Licensehttp://creativecommons.org/licenses/by/4.0/
Register here for AWS re:Invent 2024, Dec 2-6, Las Vegas, NV-------Benjamin Flast, Director, Product Management at MongoDB discusses vector search capabilities, integration with AWS Bedrock, and its transformative role in enabling scalable, efficient, and AI-powered solutions.Topics Include:Introduction to MongoDB's vector search and AWS BedrockCore concepts of vectors and embeddings explainedHigh-dimensional space and vector similarity overviewEmbedding model use in vector creationImportance of distance functions in vector relationsVector search uses k-nearest neighbor algorithmEuclidean, Cosine, and Dot Product similarity functionsApplications for different similarity functions discussedLarge language models and vector search explainedIntroduction to retrieval-augmented generation (RAG)Combining external data with LLMs in RAGMongoDB's document model for flexible data storageMongoDB Atlas platform capabilities overviewUnified interface for MongoDB document modelApproximate nearest neighbor search for efficiencyVector indexing in MongoDB for fast queryingSearch nodes for scalable vector search processingMongoDB AI integrations with third-party librariesSemantic caching for efficient response retrievalMongoDB's private link support on AWS BedrockFuture potential of vector search and RAG applicationsExample use case: Metaphor Data's data catalogExample use case: Okta's conversational interfaceExample use case: Delivery Hero product recommendationsFinal takeaways on MongoDB Atlas vector searchParticipants:Benjamin Flast - Director, Product Management, MongoDBSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon/isv/
This week, Lois Houston and Nikita Abraham continue their exploration of Oracle AI Vector Search with a deep dive into vector indexes and memory considerations. Senior Principal APEX and Apps Dev Instructor Brent Dayley breaks down what vector indexes are, how they enhance the efficiency of search queries, and the different types supported by Oracle AI Vector Search. Oracle Database 23ai: Oracle AI Vector Search Fundamentals: https://mylearn.oracle.com/ou/course/oracle-database-23ai-oracle-ai-vector-search-fundamentals/140188/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Nikita: Welcome back to the Oracle University Podcast! I'm Nikita Abraham, Team Lead of Editorial Services at Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi everyone! Last week was Part 1 of our discussion on Oracle AI Vector Search. We talked about what it is, its benefits, the new vector data type, vector embedding models, and the overall workflow. In Part 2, we're going to focus on vector indices and memory. 00:56 Nikita: And to help us break it all down, we've got Brent Dayley back with us. Brent is a Senior Principal APEX and Apps Dev Instructor with Oracle University. Hi Brent! Thanks for being with us today. So, let's jump right in! What are vector indexes and how are they useful? Brent: Now, vector indexes are specialized indexing data structures that can make your queries more efficient against your vectors. They use techniques such as clustering, and partitioning, and neighbor graphs. Now, they greatly reduce the search space, which means that your queries happen quicker. They're also extremely efficient. They do require that you enable the vector pool in the SGA. 01:42 Lois: Brent, walk us through the different types of vector indices that are supported by Oracle AI Vector Search. How do they integrate into the overall process? Brent: So Oracle AI Vector Search supports two types of indexes, in-memory neighbor graph vector index. HNSW is the only type of in-memory neighbor graph vector index that is supported. These are very efficient indexes for vector approximate similarity search. HNSW graphs are structured using principles from small world networks along with layered hierarchical organization. And neighbor partition vector index, inverted file flat index, is the only type of neighbor partition index supported. It is a partition-based index which balances high search quality with reasonable speed. 02:35 Nikita: Brent, you mentioned that enabling the vector pool in the SGA is a requirement when working with vector indexes. Can you explain that process for us? Brent: In order for you to be able to use vector indexes, you do need to enable the vector pool area. And in order to do that, what you need to do is set the vector memory size parameter. You can set it at the container database level. And the PDB inherits it from the CDB. Now bear in mind that the database does have to be balanced when you set the vector pool. 03:12 Lois: Ok. Are there any other considerations to keep in mind when using vector indices? Brent: Vector indexes are stored in this pool, and vector metadata is also stored here. And you do need to restart the database. So large vector indexes do need lots of RAM, and RAM constrains the vector index size. You should use IVF indexes when there is not enough RAM. IVF indexes use both the buffer cache as well as disk. 03:42 Nikita: And what about memory considerations? Brent: So to remind you, a vector is a numerical representation of text, images, audio, or video that encodes the features or semantic meaning of the data, instead of the actual contents, such as the words or pixels of an image. So the vector is a list of numerical values known as dimensions with a specified format. Now, Oracle does support the int8 format, the float32 format, and the float64 format. Depending on the format depends on the number of bytes. For instance, int8 is one byte, float32 is four bytes. Now, Oracle AI Vector Search supports vectors with up to 65,535 dimensions. 04:34 Lois: What should we know about creating a table with a vector column? Brent: Now, Oracle Database 23ai does have a new vector data type. The new data type was created in order to support vector search. The definition can include the number of dimensions and can include the format. Bear in mind that either one of those are optional when you define your column. The possible dimension formats are int, float 32, and float 64. Float 32 and float 64 are IEEE standards, and Oracle Database will automatically cast the value if needed. 05:18 Nikita: Can you give us a few declaration examples? Brent: Now, if we just do a vector type, then the vectors can have any arbitrary number of dimensions and formats. If we describe the vector type as vector * , *, then that means that vectors can have an arbitrary number of dimensions and formats. Vector and vector * , * are equivalent. Vector with the number of dimensions specified, followed by a comma, and then an asterisk, is equivalent to vector number of dimensions. Vectors must all have the specified number of dimensions, or an error will be thrown. Every vector will have its dimension stored without format modification. And if we do vector asterisk common dimension element format, what that means is that vectors can have an arbitrary number of dimensions, but their format will be up-converted or down-converted to the specified dimension element format, either INT8, float 32, or float 64. 06:25 Working towards an Oracle Certification this year? Take advantage of the Certification Prep live events in the Oracle University Learning Community. Get tips from OU experts and hear from others who have already taken their certifications. Once you're certified, you'll gain access to an exclusive forum for Oracle-certified users. What are you waiting for? Visit mylearn.oracle.com to get started. 06:52 Nikita: Welcome back! Brent, what is the vector constructor and why is it useful? Brent: Now, the vector constructor is a function that allows us to create vectors without having to store those in a column in a table. These are useful for learning purposes. You use these usually with a smaller number of dimensions. Bear in mind that most embedding models can contain thousands of different dimensions. You get to specify the vector values, and they usually represent two-dimensional like xy coordinates. The dimensions are optional, and the format is optional as well. 07:29 Lois: Right. Before we wrap up, can you tell us how to calculate vector distances? Brent: Now, vector distance uses the function VECTOR_DISTANCE as the main function. This allows you to calculate distances between two vectors and, therefore, takes two vectors as parameters. Optionally, you can specify a metric. If you do not specify a metric, then the default metric, COSINE, would be used. You can optionally use other shorthand functions, too. These include L1 distance, L2 distance, cosine distance, and inner product. All of these functions also take two vectors as input and return the distance between them. Now the VECTOR_DISTANCE function can be used to perform a similarity search. If a similarity search query does not specify a distance metric, then the default cosine metric will be used for both exact and approximate searches. If a similarity search does specify a distance metric in the VECTOR_DISTANCE function, then an exact search with that distance metric is used if it conflicts with the distance metric specified in a vector index. If the two distance metrics are the same, then this will be used for both exact as well as approximate searches. 08:58 Nikita: I was wondering Brent, what vector distance metrics do we have access to? Brent: We have Euclidean and Euclidean squared distances. We have cosine similarity, dot product similarity, Manhattan distance, and Hamming similarity. Let's take a closer look at the first of these metrics, Euclidean and Euclidean squared distances. This gives us the straight-line distance between two vectors. It does use the Pythagorean theorem. It is sensitive to both the vector size as well as the direction. With Euclidean distances, comparing squared distances is equivalent to comparing distances. So when ordering is more important than the distance values themselves, the squared Euclidean distance is very useful as it is faster to calculate than the Euclidean distance, which avoids the square root calculation. 09:58 Lois: And the cosine similarity metrics? Brent: It is one of the most widely used similarity metrics, especially in natural language processing. The smaller the angle means they are more similar. While cosine distance measures how different two vectors are, cosine similarity measures how similar two vectors are. Dot product similarity allows us to multiply the size of each vector by the cosine of their angle. The corresponding geometrical interpretation of this definition is equivalent to multiplying the size of one of the vectors by the size of the projection of the second vector onto the first one or vice versa. Larger means that they are more similar. Smaller means that they are less similar. Manhattan distance is useful for describing uniform grids. You can imagine yourself walking from point A to point B in a city such as Manhattan. Now, since there are buildings in the way, maybe we need to walk down one street and then turn and walk down the next street in order to get to our result. As you can imagine, this metric is most useful for vectors describing objects on a uniform grid such as city blocks, power grids, or perhaps a chessboard. 11:27 Nikita: And finally, we have Hamming similarity, right? Brent: This describes where vector dimensions differ. They are binary vectors, and it tells us the number of bits that require change to match. It compares the position of each bit in the sequence. Now, these are usually used in order to detect network errors. 11:53 Nikita: Brent, thanks for joining us these last two weeks and explaining what Oracle AI Vector Search is. If you want to learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai: Oracle AI Vector Search Fundamentals course. Lois: This concludes our season on Oracle Database 23ai New Features for administrators. In our next episode, we're going to talk about database backup and recovery, but more on that later! Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 12:29 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Matt Schroeder is the Co-Founder and CEO of The Cosine. In this episode of Specified Growth Podcast, Matt talks about his background in lighting and how The Cosine was founded. He also discusses some of the current inefficiencies and challenges in the lighting industry, what they're doing to solve them at The Cosine, and more. Don't miss this episode of Specified Growth Podcast! Please reach out if you have any feedback or questions. Enjoy! Twitter: @TatsuyaNakagawa Instagram: @tats_talks LinkedIn: Tatsuya Nakagawa YouTube: Tats Talks www.tatstalk.com www.castagra.com Learn more about your ad choices. Visit megaphone.fm/adchoices
We all have fond memories of the first Dev Day in 2023:and the blip that followed soon after. As Ben Thompson has noted, this year's DevDay took a quieter, more intimate tone. No Satya, no livestream, (slightly fewer people?). Instead of putting ChatGPT announcements in DevDay as in 2023, o1 was announced 2 weeks prior, and DevDay 2024 was reserved purely for developer-facing API announcements, primarily the Realtime API, Vision Finetuning, Prompt Caching, and Model Distillation.However the larger venue and more spread out schedule did allow a lot more hallway conversations with attendees as well as more community presentations including our recent guest Alistair Pullen of Cosine as well as deeper dives from OpenAI including our recent guest Michelle Pokrass of the API Team. Thanks to OpenAI's warm collaboration (we particularly want to thank Lindsay McCallum Rémy!), we managed to record exclusive interviews with many of the main presenters of both the keynotes and breakout sessions. We present them in full in today's episode, together with a full lightly edited Q&A with Sam Altman.Show notes and related resourcesSome of these used in the final audio episode below* Simon Willison Live Blog* swyx live tweets and videos* Greg Kamradt coverage of Structured Output session, Scaling LLM Apps session* Fireside Chat Q&A with Sam AltmanTimestamps* [00:00:00] Intro by Suno.ai* [00:01:23] NotebookLM Recap of DevDay* [00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling* [00:19:16] Olivier Godement, Head of Product, OpenAI* [00:36:57] Romain Huet, Head of DX, OpenAI* [00:47:08] Michelle Pokrass, API Tech Lead at OpenAI ft. Simon Willison* [01:04:45] Alistair Pullen, CEO, Cosine (Genie)* [01:18:31] Sam Altman + Kevin Weill Q&A* [02:03:07] Notebook LM Recap of PodcastTranscript[00:00:00] Suno AI: Under dev daylights, code ignites. Real time voice streams reach new heights. O1 and GPT, 4. 0 in flight. Fine tune the future, data in sight. Schema sync up, outputs precise. Distill the models, efficiency splice.[00:00:33] AI Charlie: Happy October. This is your AI co host, Charlie. One of our longest standing traditions is covering major AI and ML conferences in podcast format. Delving, yes delving, into the vibes of what it is like to be there stitched in with short samples of conversations with key players, just to help you feel like you were there.[00:00:54] AI Charlie: Covering this year's Dev Day was significantly more challenging because we were all requested not to record the opening keynotes. So, in place of the opening keynotes, we had the viral notebook LM Deep Dive crew, my new AI podcast nemesis, Give you a seven minute recap of everything that was announced.[00:01:15] AI Charlie: Of course, you can also check the show notes for details. I'll then come back with an explainer of all the interviews we have for you today. Watch out and take care.[00:01:23] NotebookLM Recap of DevDay[00:01:23] NotebookLM: All right, so we've got a pretty hefty stack of articles and blog posts here all about open ais. Dev day 2024.[00:01:32] NotebookLM 2: Yeah, lots to dig into there.[00:01:34] NotebookLM 2: Seems[00:01:34] NotebookLM: like you're really interested in what's new with AI.[00:01:36] NotebookLM 2: Definitely. And it seems like OpenAI had a lot to announce. New tools, changes to the company. It's a lot.[00:01:43] NotebookLM: It is. And especially since you're interested in how AI can be used in the real world, you know, practical applications, we'll focus on that.[00:01:51] NotebookLM: Perfect. Like, for example, this Real time API, they announced that, right? That seems like a big deal if we want AI to sound, well, less like a robot.[00:01:59] NotebookLM 2: It could be huge. The real time API could completely change how we, like, interact with AI. Like, imagine if your voice assistant could actually handle it if you interrupted it.[00:02:08] NotebookLM: Or, like, have an actual conversation.[00:02:10] NotebookLM 2: Right, not just these clunky back and forth things we're used to.[00:02:14] NotebookLM: And they actually showed it off, didn't they? I read something about a travel app, one for languages. Even one where the AI ordered takeout.[00:02:21] NotebookLM 2: Those demos were really interesting, and I think they show how this real time API can be used in so many ways.[00:02:28] NotebookLM 2: And the tech behind it is fascinating, by the way. It uses persistent WebSocket connections and this thing called function calling, so it can respond in real time.[00:02:38] NotebookLM: So the function calling thing, that sounds kind of complicated. Can you, like, explain how that works?[00:02:42] NotebookLM 2: So imagine giving the AI Access to this whole toolbox, right?[00:02:46] NotebookLM 2: Information, capabilities, all sorts of things. Okay. So take the travel agent demo, for example. With function calling, the AI can pull up details, let's say about Fort Mason, right, from some database. Like nearby restaurants, stuff like that.[00:02:59] NotebookLM: Ah, I get it. So instead of being limited to what it already knows, It can go and find the information it needs, like a human travel agent would.[00:03:07] NotebookLM 2: Precisely. And someone on Hacker News pointed out a cool detail. The API actually gives you a text version of what's being said. So you can store that, analyze it.[00:03:17] NotebookLM: That's smart. It seems like OpenAI put a lot of thought into making this API easy for developers to use. But, while we're on OpenAI, you know, Besides their tech, there's been some news about, like, internal changes, too.[00:03:30] NotebookLM: Didn't they say they're moving away from being a non profit?[00:03:32] NotebookLM 2: They did. And it's got everyone talking. It's a major shift. And it's only natural for people to wonder how that'll change things for OpenAI in the future. I mean, there are definitely some valid questions about this move to for profit. Like, will they have more money for research now?[00:03:46] NotebookLM 2: Probably. But will they, you know, care as much about making sure AI benefits everyone?[00:03:51] NotebookLM: Yeah, that's the big question, especially with all the, like, the leadership changes happening at OpenAI too, right? I read that their Chief Research Officer left, and their VP of Research, and even their CTO.[00:04:03] NotebookLM 2: It's true. A lot of people are connecting those departures with the changes in OpenAI's structure.[00:04:08] NotebookLM: And I guess it makes you wonder what's going on behind the scenes. But they are still putting out new stuff. Like this whole fine tuning thing really caught my eye.[00:04:17] NotebookLM 2: Right, fine tuning. It's essentially taking a pre trained AI model. And, like, customizing it.[00:04:23] NotebookLM: So instead of a general AI, you get one that's tailored for a specific job.[00:04:27] NotebookLM 2: Exactly. And that opens up so many possibilities, especially for businesses. Imagine you could train an AI on your company's data, you know, like how you communicate your brand guidelines.[00:04:37] NotebookLM: So it's like having an AI that's specifically trained for your company?[00:04:41] NotebookLM 2: That's the idea.[00:04:41] NotebookLM: And they're doing it with images now, too, right?[00:04:44] NotebookLM: Fine tuning with vision is what they called it.[00:04:46] NotebookLM 2: It's pretty incredible what they're doing with that, especially in fields like medicine.[00:04:50] NotebookLM: Like using AI to help doctors make diagnoses.[00:04:52] NotebookLM 2: Exactly. And AI could be trained on thousands of medical images, right? And then it could potentially spot things that even a trained doctor might miss.[00:05:03] NotebookLM: That's kind of scary, to be honest. What if it gets it wrong?[00:05:06] NotebookLM 2: Well, the idea isn't to replace doctors, but to give them another tool, you know, help them make better decisions.[00:05:12] NotebookLM: Okay, that makes sense. But training these AI models must be really expensive.[00:05:17] NotebookLM 2: It can be. All those tokens add up. But OpenAI announced something called automatic prompt caching.[00:05:23] Alex Volkov: Automatic what now? I don't think I came across that.[00:05:26] NotebookLM 2: So basically, if your AI sees a prompt that it's already seen before, OpenAI will give you a discount.[00:05:31] NotebookLM: Huh. Like a frequent buyer program for AI.[00:05:35] NotebookLM 2: Kind of, yeah. It's good that they're trying to make it more affordable. And they're also doing something called model distillation.[00:05:41] NotebookLM: Okay, now you're just using big words to sound smart. What's that?[00:05:45] NotebookLM 2: Think of it like like a recipe, right? You can take a really complex recipe and break it down to the essential parts.[00:05:50] NotebookLM: Make it simpler, but it still tastes the same.[00:05:53] NotebookLM 2: Yeah. And that's what model distillation is. You take a big, powerful AI model and create a smaller, more efficient version.[00:06:00] NotebookLM: So it's like lighter weight, but still just as capable.[00:06:03] NotebookLM 2: Exactly. And that means more people can actually use these powerful tools. They don't need, like, a supercomputer to run them.[00:06:10] NotebookLM: So they're making AI more accessible. That's great.[00:06:13] NotebookLM 2: It is. And speaking of powerful tools, they also talked about their new O1 model.[00:06:18] NotebookLM 2: That's the one they've been hyping up. The one that's supposed to be this big leap forward.[00:06:22] NotebookLM: Yeah, O1. It sounds pretty futuristic. Like, from what I read, it's not just a bigger, better language model.[00:06:28] NotebookLM 2: Right. It's a different porch.[00:06:29] NotebookLM: They're saying it can, like, actually reason, right? Think.[00:06:33] NotebookLM 2: It's trained differently.[00:06:34] NotebookLM 2: They used reinforcement learning with O1.[00:06:36] NotebookLM: So it's not just finding patterns in the data it's seen before.[00:06:40] NotebookLM 2: Not just that. It can actually learn from its mistakes. Get better at solving problems.[00:06:46] NotebookLM: So give me an example. What can O1 do that, say, GPT 4 can't?[00:06:51] NotebookLM 2: Well, OpenAI showed it doing some pretty impressive stuff with math, like advanced math.[00:06:56] NotebookLM 2: And coding, too. Complex coding. Things that even GPT 4 struggled with.[00:07:00] NotebookLM: So you're saying if I needed to, like, write a screenplay, I'd stick with GPT 4? But if I wanted to solve some crazy physics problem, O1 is what I'd use.[00:07:08] NotebookLM 2: Something like that, yeah. Although there is a trade off. O1 takes a lot more power to run, and it takes longer to get those impressive results.[00:07:17] NotebookLM: Hmm, makes sense. More power, more time, higher quality.[00:07:21] NotebookLM 2: Exactly.[00:07:22] NotebookLM: It sounds like it's still in development, though, right? Is there anything else they're planning to add to it?[00:07:26] NotebookLM 2: Oh, yeah. They mentioned system prompts, which will let developers, like, set some ground rules for how it behaves. And they're working on adding structured outputs and function calling.[00:07:38] Alex Volkov: Wait, structured outputs? Didn't we just talk about that? We[00:07:41] NotebookLM 2: did. That's the thing where the AI's output is formatted in a way that's easy to use.[00:07:47] NotebookLM: Right, right. So you don't have to spend all day trying to make sense of what it gives you. It's good that they're thinking about that stuff.[00:07:53] NotebookLM 2: It's about making these tools usable.[00:07:56] NotebookLM 2: And speaking of that, Dev Day finished up with this really interesting talk. Sam Altman, the CEO of OpenAI, And Kevin Weil, their new chief product officer. They talked about, like, the big picture for AI.[00:08:09] NotebookLM: Yeah, they did, didn't they? Anything interesting come up?[00:08:12] NotebookLM 2: Well, Altman talked about moving past this whole AGI term, Artificial General Intelligence.[00:08:18] NotebookLM: I can see why. It's kind of a loaded term, isn't it?[00:08:20] NotebookLM 2: He thinks it's become a bit of a buzzword, and people don't really understand what it means.[00:08:24] NotebookLM: So are they saying they're not trying to build AGI anymore?[00:08:28] NotebookLM 2: It's more like they're saying they're focused on just Making AI better, constantly improving it, not worrying about putting it in a box.[00:08:36] NotebookLM: That makes sense. Keep pushing the limits.[00:08:38] NotebookLM 2: Exactly. But they were also very clear about doing it responsibly. They talked a lot about safety and ethics.[00:08:43] NotebookLM: Yeah, that's important.[00:08:44] NotebookLM 2: They said they were going to be very careful. About how they release new features.[00:08:48] NotebookLM: Good! Because this stuff is powerful.[00:08:51] NotebookLM 2: It is. It was a lot to take in, this whole Dev Day event.[00:08:54] NotebookLM 2: New tools, big changes at OpenAI, and these big questions about the future of AI.[00:08:59] NotebookLM: It was. But hopefully this deep dive helped make sense of some of it. At least, that's what we try to do here.[00:09:05] AI Charlie: Absolutely.[00:09:06] NotebookLM: Thanks for taking the deep dive with us.[00:09:08] AI Charlie: The biggest demo of the new Realtime API involved function calling with voice mode and buying chocolate covered strawberries from our friendly local OpenAI developer experience engineer and strawberry shop owner, Ilan Biggio.[00:09:21] AI Charlie: We'll first play you the audio of his demo and then go into a little interview with him.[00:09:25] Ilan's Strawberry Demo with Realtime Voice Function Calling[00:09:25] Romain Huet: Could you place a call and see if you could get us 400 strawberries delivered to the venue? But please keep that under 1500. I'm on it. We'll get those strawberries delivered for you.[00:09:47] Ilan: Hello? Hi there. Is this Ilan? I'm Romain's AI assistant. How is it going? Fantastic. Can you tell me what flavors of strawberry dips you have for me? Yeah, we have chocolate, vanilla, and we have peanut butter. Wait, how much would 400 chocolate covered strawberries cost? 400? Are you sure you want 400? Yes, 400 chocolate covered[00:10:14] swyx: strawberries.[00:10:15] Ilan: Wait,[00:10:16] swyx: how much[00:10:16] Ilan: would that be? I think that'll be around, like, 1, 415. 92.[00:10:25] Alex Volkov: Awesome. Let's go ahead and place the order for four chocolate covered strawberries.[00:10:31] Ilan: Great, where would you like that delivered? Please deliver them to the Gateway Pavilion at Fort Mason. And I'll be paying in cash.[00:10:42] Alex Volkov: Okay,[00:10:43] Ilan: sweet. So just to confirm, you want four strawberries?[00:10:45] Ilan: 400 chocolate covered strawberries to the Gateway Pavilion. Yes, that's perfect. And when can we expect delivery? Well, you guys are right nearby, so it'll be like, I don't know, 37 seconds? That's incredibly fast. Cool, you too.[00:11:09] swyx: Hi, Ilan, welcome to Lanespace. Oh, thank you. I just saw your amazing demos, had your amazing strawberries. You are dressed up, like, exactly like a strawberry salesman. Gotta have it all. What was the building on demo like? What was the story behind the demo?[00:11:22] swyx: It was really interesting. This is actually something I had been thinking about for months before the launch.[00:11:27] swyx: Like, having a, like, AI that can make phone calls is something like I've personally wanted for a long time. And so as soon as we launched internally, like, I started hacking on it. And then that sort of just started. We made it into like an internal demo, and then people found it really interesting, and then we thought how cool would it be to have this like on stage as, as one of the demos.[00:11:47] swyx: Yeah, would would you call out any technical issues building, like you were basically one of the first people ever to build with a voice mode API. Would you call out any issues like integrating it with Twilio like that, like you did with function calling, with like a form filling elements. I noticed that you had like intents of things to fulfill, and then.[00:12:07] swyx: When there's still missing info, the voice would prompt you, roleplaying the store guy.[00:12:13] swyx: Yeah, yeah, so, I think technically, there's like the whole, just working with audio and streams is a whole different beast. Like, even separate from like AI and this, this like, new capabilities, it's just, it's just tough.[00:12:26] swyx: Yeah, when you have a prompt, conversationally it'll just follow, like the, it was, Instead of like, kind of step by step to like ask the right questions based on like the like what the request was, right? The function calling itself is sort of tangential to that. Like, you have to prompt it to call the functions, but then handling it isn't too much different from, like, what you would do with assistant streaming or, like, chat completion streaming.[00:12:47] swyx: I think, like, the API feels very similar just to, like, if everything in the API was streaming, it actually feels quite familiar to that.[00:12:53] swyx: And then, function calling wise, I mean, does it work the same? I don't know. Like, I saw a lot of logs. You guys showed, like, in the playground, a lot of logs. What is in there?[00:13:03] swyx: What should people know?[00:13:04] swyx: Yeah, I mean, it is, like, the events may have different names than the streaming events that we have in chat completions, but they represent very similar things. It's things like, you know, function call started, argument started, it's like, here's like argument deltas, and then like function call done.[00:13:20] swyx: Conveniently we send one that has the full function, and then I just use that. Nice.[00:13:25] swyx: Yeah and then, like, what restrictions do, should people be aware of? Like, you know, I think, I think, before we recorded, we discussed a little bit about the sensitivities around basically calling random store owners and putting, putting like an AI on them.[00:13:40] swyx: Yeah, so there's, I think there's recent regulation on that, which is why we want to be like very, I guess, aware of, of You know, you can't just call anybody with AI, right? That's like just robocalling. You wouldn't want someone just calling you with AI.[00:13:54] swyx: I'm a developer, I'm about to do this on random people.[00:13:57] swyx: What laws am I about to break?[00:14:00] swyx: I forget what the governing body is, but you should, I think, Having consent of the person you're about to call, it always works. I, as the strawberry owner, have consented to like getting called with AI. I think past that you, you want to be careful. Definitely individuals are more sensitive than businesses.[00:14:19] swyx: I think businesses you have a little bit more leeway. Also, they're like, businesses I think have an incentive to want to receive AI phone calls. Especially if like, they're dealing with it. It's doing business. Right, like, it's more business. It's kind of like getting on a booking platform, right, you're exposed to more.[00:14:33] swyx: But, I think it's still very much like a gray area. Again, so. I think everybody should, you know, tread carefully, like, figure out what it is. I, I, I, the law is so recent, I didn't have enough time to, like, I'm also not a lawyer. Yeah, yeah, yeah, of course. Yeah.[00:14:49] swyx: Okay, cool fair enough. One other thing, this is kind of agentic.[00:14:52] swyx: Did you use a state machine at all? Did you use any framework? No. You just stick it in context and then just run it in a loop until it ends call?[00:15:01] swyx: Yeah, there isn't even a loop, like Okay. Because the API is just based on sessions. It's always just going to keep going. Every time you speak, it'll trigger a call.[00:15:11] swyx: And then after every function call was also invoked invoking like a generation. And so that is another difference here. It's like it's inherently almost like in a loop, be just by being in a session, right? No state machines needed. I'd say this is very similar to like, the notion of routines, where it's just like a list of steps.[00:15:29] swyx: And it, like, sticks to them softly, but usually pretty well. And the steps is the prompts? The steps, it's like the prompt, like the steps are in the prompt. Yeah, yeah, yeah. Right, it's like step one, do this, step one, step two, do that. What if I want to change the system prompt halfway through the conversation?[00:15:44] swyx: You can. Okay. You can. To be honest, I have not played without two too much. Yeah,[00:15:47] swyx: yeah.[00:15:48] swyx: But, I know you can.[00:15:49] swyx: Yeah, yeah. Yeah. Awesome. I noticed that you called it real time API, but not voice API. Mm hmm. So I assume that it's like real time API starting with voice. Right, I think that's what he said on the thing.[00:16:00] swyx: I can't imagine, like, what else is real[00:16:02] swyx: time? Well, I guess, to use ChatGPT's voice mode as an example, Like, we've demoed the video, right? Like, real time image, right? So, I'm not actually sure what timelines are, But I would expect, if I had to guess, That, like, that is probably the next thing that we're gonna be making.[00:16:17] swyx: You'd probably have to talk directly with the team building this. Sure. But, You can't promise their timelines. Yeah, yeah, yeah, right, exactly. But, like, given that this is the features that currently, Or that exists that we've demoed on Chachapiti. Yeah. There[00:16:29] swyx: will never be a[00:16:29] swyx: case where there's like a real time text API, right?[00:16:31] swyx: I don't Well, this is a real time text API. You can do text only on this. Oh. Yeah. I don't know why you would. But it's actually So text to text here doesn't quite make a lot of sense. I don't think you'll get a lot of latency gain. But, like, speech to text is really interesting. Because you can prevent You can prevent responses, like audio responses.[00:16:54] swyx: And force function calls. And so you can do stuff like UI control. That is like super super reliable. We had a lot of like, you know, un, like, we weren't sure how well this was gonna work because it's like, you have a voice answering. It's like a whole persona, right? Like, that's a little bit more, you know, risky.[00:17:10] swyx: But if you, like, cut out the audio outputs and make it so it always has to output a function, like you can end up with pretty pretty good, like, Pretty reliable, like, command like a command architecture. Yeah,[00:17:21] swyx: actually, that's the way I want to interact with a lot of these things as well. Like, one sided voice.[00:17:26] swyx: Yeah, you don't necessarily want to hear the[00:17:27] swyx: voice back. And like, sometimes it's like, yeah, I think having an output voice is great. But I feel like I don't always want to hear an output voice. I'd say usually I don't. But yeah, exactly, being able to speak to it is super sweet.[00:17:39] swyx: Cool. Do you want to comment on any of the other stuff that you announced?[00:17:41] swyx: From caching I noticed was like, I like the no code change part. I'm looking forward to the docs because I'm sure there's a lot of details on like, what you cache, how long you cache. Cause like, enthalpy caches were like 5 minutes. I was like, okay, but what if I don't make a call every 5 minutes?[00:17:56] swyx: Yeah,[00:17:56] swyx: to be super honest with you, I've been so caught up with the real time API and making the demo that I haven't read up on the other stuff. Launches too much. I mean, I'm aware of them, but I think I'm excited to see how all distillation works. That's something that we've been doing like, I don't know, I've been like doing it between our models for a while And I've seen really good results like I've done back in a day like from GPT 4 to GPT 3.[00:18:19] swyx: 5 And got like, like pretty much the same level of like function calling with like hundreds of functions So that was super super compelling So, I feel like easier distillation, I'm really excited for. I see. Is it a tool?[00:18:31] swyx: So, I saw evals. Yeah. Like, what is the distillation product? It wasn't super clear, to be honest.[00:18:36] swyx: I, I think I want to, I want to let that team, I want to let that team talk about it. Okay,[00:18:40] swyx: alright. Well, I appreciate you jumping on. Yeah, of course. Amazing demo. It was beautifully designed. I'm sure that was part of you and Roman, and[00:18:47] swyx: Yeah, I guess, shout out to like, the first people to like, creators of Wanderlust, originally, were like, Simon and Carolis, and then like, I took it and built the voice component and the voice calling components.[00:18:59] swyx: Yeah, so it's been a big team effort. And like the entire PI team for like Debugging everything as it's been going on. It's been, it's been so good working with them. Yeah, you're the first consumers on the DX[00:19:07] swyx: team. Yeah. Yeah, I mean, the classic role of what we do there. Yeah. Okay, yeah, anything else? Any other call to action?[00:19:13] swyx: No, enjoy Dev Day. Thank you. Yeah. That's it.[00:19:16] Olivier Godement, Head of Product, OpenAI[00:19:16] AI Charlie: The latent space crew then talked to Olivier Godmont, head of product for the OpenAI platform, who led the entire Dev Day keynote and introduced all the major new features and updates that we talked about today.[00:19:28] swyx: Okay, so we are here with Olivier Godmont. That's right.[00:19:32] swyx: I don't pronounce French. That's fine. It was perfect. And it was amazing to see your keynote today. What was the back story of, of preparing something like this? Preparing, like, Dev Day? It[00:19:43] Olivier Godement: essentially came from a couple of places. Number one, excellent reception from last year's Dev Day.[00:19:48] Olivier Godement: Developers, startup founders, researchers want to spend more time with OpenAI, and we want to spend more time with them as well. And so for us, like, it was a no brainer, frankly, to do it again, like, you know, like a nice conference. The second thing is going global. We've done a few events like in Paris and like a few other like, you know, non European, non American countries.[00:20:05] Olivier Godement: And so this year we're doing SF, Singapore, and London. To frankly just meet more developers.[00:20:10] swyx: Yeah, I'm very excited for the Singapore one.[00:20:12] Olivier Godement: Ah,[00:20:12] swyx: yeah. Will you be[00:20:13] Olivier Godement: there?[00:20:14] swyx: I don't know. I don't know if I got an invite. No. I can't just talk to you. Yeah, like, and then there was some speculation around October 1st.[00:20:22] Olivier Godement: Yeah. Is it because[00:20:23] swyx: 01, October 1st? It[00:20:25] Olivier Godement: has nothing to do. I discovered the tweet yesterday where like, people are so creative. No one, there was no connection to October 1st. But in hindsight, that would have been a pretty good meme by Tiana. Okay.[00:20:37] swyx: Yeah, and you know, I think like, OpenAI's outreach to developers is something that I felt the whole in 2022, when like, you know, like, people were trying to build a chat GPT, and like, there was no function calling, all that stuff that you talked about in the past.[00:20:51] swyx: And that's why I started my own conference as like like, here's our little developer conference thing. And, but to see this OpenAI Dev Day now, and like to see so many developer oriented products coming to OpenAI, I think it's really encouraging.[00:21:02] Olivier Godement: Yeah, totally. It's that's what I said, essentially, like, developers are basically the people who make the best connection between the technology and, you know, the future, essentially.[00:21:14] Olivier Godement: Like, you know, essentially see a capability, see a low level, like, technology, and are like, hey, I see how that application or that use case that can be enabled. And so, in the direction of enabling, like, AGI, like, all of humanity, it's a no brainer for us, like, frankly, to partner with Devs.[00:21:31] Alessio: And most importantly, you almost never had waitlists, which, compared to like other releases, people usually, usually have.[00:21:38] Alessio: What is the, you know, you had from caching, you had real time voice API, we, you know, Shawn did a long Twitter thread, so people know the releases. Yeah. What is the thing that was like sneakily the hardest to actually get ready for, for that day, or like, what was the kind of like, you know, last 24 hours, anything that you didn't know was gonna work?[00:21:56] Olivier Godement: Yeah. The old Fairly, like, I would say, involved, like, features to ship. So the team has been working for a month, all of them. The one which I would say is the newest for OpenAI is the real time API. For a couple of reasons. I mean, one, you know, it's a new modality. Second, like, it's the first time that we have an actual, like, WebSocket based API.[00:22:16] Olivier Godement: And so, I would say that's the one that required, like, the most work over the month. To get right from a developer perspective and to also make sure that our existing safety mitigation that worked well with like real time audio in and audio out.[00:22:30] swyx: Yeah, what design choices or what was like the sort of design choices that you want to highlight?[00:22:35] swyx: Like, you know, like I think for me, like, WebSockets, you just receive a bunch of events. It's two way. I obviously don't have a ton of experience. I think a lot of developers are going to have to embrace this real time programming. Like, what are you designing for, or like, what advice would you have for developers exploring this?[00:22:51] Olivier Godement: The core design hypothesis was essentially, how do we enable, like, human level latency? We did a bunch of tests, like, on average, like, human beings, like, you know, takes, like, something like 300 milliseconds to converse with each other. And so that was the design principle, essentially. Like, working backward from that, and, you know, making the technology work.[00:23:11] Olivier Godement: And so we evaluated a few options, and WebSockets was the one that we landed on. So that was, like, one design choice. A few other, like, big design choices that we had to make prompt caching. Prompt caching, the design, like, target was automated from the get go. Like, zero code change from the developer.[00:23:27] Olivier Godement: That way you don't have to learn, like, what is a prompt prefix, and, you know, how long does a cache work, like, we just do it as much as we can, essentially. So that was a big design choice as well. And then finally, on distillation, like, and evaluation. The big design choice was something I learned at Skype, like in my previous job, like a philosophy around, like, a pit of success.[00:23:47] Olivier Godement: Like, what is essentially the, the, the minimum number of steps for the majority of developers to do the right thing? Because when you do evals on fat tuning, there are many, many ways, like, to mess it up, frankly, like, you know, and have, like, a crappy model, like, evals that tell, like, a wrong story. And so our whole design was, okay, we actually care about, like, helping people who don't have, like, that much experience, like, evaluating a model, like, get, like, in a few minutes, like, to a good spot.[00:24:11] Olivier Godement: And so how do we essentially enable that bit of success, like, in the product flow?[00:24:15] swyx: Yeah, yeah, I'm a little bit scared to fine tune especially for vision, because I don't know what I don't know for stuff like vision, right? Like, for text, I can evaluate pretty easily. For vision let's say I'm like trying to, one of your examples was grab.[00:24:33] swyx: Which, very close to home, I'm from Singapore. I think your example was like, they identified stop signs better. Why is that hard? Why do I have to fine tune that? If I fine tune that, do I lose other things? You know, like, there's a lot of unknowns with Vision that I think developers have to figure out.[00:24:50] swyx: For[00:24:50] Olivier Godement: sure. Vision is going to open up, like, a new, I would say, evaluation space. Because you're right, like, it's harder, like, you know, to tell correct from incorrect, essentially, with images. What I can say is we've been alpha testing, like, the Vision fine tuning, like, for several weeks at that point. We are seeing, like, even higher performance uplift compared to text fine tuning.[00:25:10] Olivier Godement: So that's, there is something here, like, we've been pretty impressed, like, in a good way, frankly. But, you know, how well it works. But for sure, like, you know, I expect the developers who are moving from one modality to, like, text and images will have, like, more, you know Testing, evaluation, like, you know, to set in place, like, to make sure it works well.[00:25:25] Alessio: The model distillation and evals is definitely, like, the most interesting. Moving away from just being a model provider to being a platform provider. How should people think about being the source of truth? Like, do you want OpenAI to be, like, the system of record of all the prompting? Because people sometimes store it in, like, different data sources.[00:25:41] Alessio: And then, is that going to be the same as the models evolve? So you don't have to worry about, you know, refactoring the data, like, things like that, or like future model structures.[00:25:51] Olivier Godement: The vision is if you want to be a source of truth, you have to earn it, right? Like, we're not going to force people, like, to pass us data.[00:25:57] Olivier Godement: There is no value prop, like, you know, for us to store the data. The vision here is at the moment, like, most developers, like, use like a one size fits all model, like be off the shelf, like GP40 essentially. The vision we have is fast forward a couple of years. I think, like, most developers will essentially, like, have a.[00:26:15] Olivier Godement: An automated, continuous, fine tuned model. The more, like, you use the model, the more data you pass to the model provider, like, the model is automatically, like, fine tuned, evaluated against some eval sets, and essentially, like, you don't have to every month, when there is a new snapshot, like, you know, to go online and, you know, try a few new things.[00:26:34] Olivier Godement: That's a direction. We are pretty far away from it. But I think, like, that evaluation and decision product are essentially a first good step in that direction. It's like, hey, it's you. I set it by that direction, and you give us the evaluation data. We can actually log your completion data and start to do some automation on your behalf.[00:26:52] Alessio: And then you can do evals for free if you share data with OpenAI. How should people think about when it's worth it, when it's not? Sometimes people get overly protective of their data when it's actually not that useful. But how should developers think about when it's right to do it, when not, or[00:27:07] Olivier Godement: if you have any thoughts on it?[00:27:08] Olivier Godement: The default policy is still the same, like, you know, we don't train on, like, any API data unless you opt in. What we've seen from feedback is evaluation can be expensive. Like, if you run, like, O1 evals on, like, thousands of samples Like, your build will get increased, like, you know, pretty pretty significantly.[00:27:22] Olivier Godement: That's problem statement number one. Problem statement number two is, essentially, I want to get to a world where whenever OpenAI ships a new model snapshot, we have full confidence that there is no regression for the task that developers care about. And for that to be the case, essentially, we need to get evals.[00:27:39] Olivier Godement: And so that, essentially, is a sort of a two bugs one stone. It's like, we subsidize, basically, the evals. And we also use the evals when we ship new models to make sure that we keep going in the right direction. So, in my sense, it's a win win, but again, completely opt in. I expect that many developers will not want to share their data, and that's perfectly fine to me.[00:27:56] swyx: Yeah, I think free evals though, very, very good incentive. I mean, it's a fair trade. You get data, we get free evals. Exactly,[00:28:04] Olivier Godement: and we sanitize PII, everything. We have no interest in the actual sensitive data. We just want to have good evaluation on the real use cases.[00:28:13] swyx: Like, I always want to eval the eval. I don't know if that ever came up.[00:28:17] swyx: Like, sometimes the evals themselves are wrong, and there's no way for me to tell you.[00:28:22] Olivier Godement: Everyone who is starting with LLM, teaching with LLM, is like, Yeah, evaluation, easy, you know, I've done testing, like, all my life. And then you start to actually be able to eval, understand, like, all the corner cases, And you realize, wow, there's like a whole field in itself.[00:28:35] Olivier Godement: So, yeah, good evaluation is hard and so, yeah. Yeah, yeah.[00:28:38] swyx: But I think there's a, you know, I just talked to Brain Trust which I think is one of your partners. Mm-Hmm. . They also emphasize code based evals versus your sort of low code. What I see is like, I don't know, maybe there's some more that you didn't demo.[00:28:53] swyx: YC is kind of like a low code experience, right, for evals. Would you ever support like a more code based, like, would I run code on OpenAI's eval platform?[00:29:02] Olivier Godement: For sure. I mean, we meet developers where they are, you know. At the moment, the demand was more for like, you know, easy to get started, like eval. But, you know, if we need to expose like an evaluation API, for instance, for people like, you know, to pass, like, you know, their existing test data we'll do it.[00:29:15] Olivier Godement: So yeah, there is no, you know, philosophical, I would say, like, you know, misalignment on that. Yeah,[00:29:19] swyx: yeah, yeah. What I think this is becoming, by the way, and I don't, like it's basically, like, you're becoming AWS. Like, the AI cloud. And I don't know if, like, that's a conscious strategy, or it's, like, It doesn't even have to be a conscious strategy.[00:29:33] swyx: Like, you're going to offer storage. You're going to offer compute. You're going to offer networking. I don't know what networking looks like. Networking is maybe, like, Caching or like it's a CDN. It's a prompt CDN.[00:29:45] Alex Volkov: Yeah,[00:29:45] swyx: but it's the AI versions of everything, right? Do you like do you see the analogies or?[00:29:52] Olivier Godement: Whatever Whatever I took to developers. I feel like Good models are just half of the story to build a good app There's a third model you need to do Evaluation is the perfect example. Like, you know, you can have the best model in the world If you're in the dark, like, you know, it's really hard to gain the confidence and so Our philosophy is[00:30:11] Olivier Godement: The whole like software development stack is being basically reinvented, you know, with LLMs. There is no freaking way that open AI can build everything. Like there is just too much to build, frankly. And so my philosophy is, essentially, we'll focus on like the tools which are like the closest to the model itself.[00:30:28] Olivier Godement: So that's why you see us like, you know, investing quite a bit in like fine tuning, distillation, our evaluation, because we think that it actually makes sense to have like in one spot, Like, you know, all of that. Like, there is some sort of virtual circle, essentially, that you can set in place. But stuff like, you know, LLMOps, like tools which are, like, further away from the model, I don't know if you want to do, like, you know, super elaborate, like, prompt management, or, you know, like, tooling, like, I'm not sure, like, you know, OpenAI has, like, such a big edge, frankly, like, you know, to build this sort of tools.[00:30:56] Olivier Godement: So that's how we view it at the moment. But again, frankly, the philosophy is super simple. The strategy is super simple. It's meeting developers where they want us to be. And so, you know that's frankly, like, you know, day in, day out, like, you know, what I try to do.[00:31:08] Alessio: Cool. Thank you so much for the time.[00:31:10] Alessio: I'm sure you,[00:31:10] swyx: Yeah, I have more questions on, a couple questions on voice, and then also, like, your call to action, like, what you want feedback on, right? So, I think we should spend a bit more time on voice, because I feel like that's, like, the big splash thing. I talked well Well, I mean, I mean, just what is the future of real time for OpenAI?[00:31:28] swyx: Yeah. Because I think obviously video is next. You already have it in the, the ChatGPT desktop app. Do we just have a permanent, like, you know, like, are developers just going to be, like, sending sockets back and forth with OpenAI? Like how do we program for that? Like, what what is the future?[00:31:44] Olivier Godement: Yeah, that makes sense. I think with multimodality, like, real time is quickly becoming, like, you know, essentially the right experience, like, to build an application. Yeah. So my expectation is that we'll see like a non trivial, like a volume of applications like moving to a real time API. Like if you zoom out, like, audio is really simple, like, audio until basically now.[00:32:05] Olivier Godement: Audio on the web, in apps, was basically very much like a second class citizen. Like, you basically did like an audio chatbot for users who did not have a choice. You know, they were like struggling to read, or I don't know, they were like not super educated with technology. And so, frankly, it was like the crappy option, you know, compared to text.[00:32:25] Olivier Godement: But when you talk to people in the real world, the vast majority of people, like, prefer to talk and listen instead of typing and writing.[00:32:34] swyx: We speak before we write.[00:32:35] Olivier Godement: Exactly. I don't know. I mean, I'm sure it's the case for you in Singapore. For me, my friends in Europe, the number of, like, WhatsApp, like, voice notes they receive every day, I mean, just people, it makes sense, frankly, like, you know.[00:32:45] Olivier Godement: Chinese. Chinese, yeah.[00:32:46] swyx: Yeah,[00:32:47] Olivier Godement: all voice. You know, it's easier. There is more emotions. I mean, you know, you get the point across, like, pretty well. And so my personal ambition for, like, the real time API and, like, audio in general is to make, like, audio and, like, multimodality, like, truly a first class experience.[00:33:01] Olivier Godement: Like, you know, if you're, like, you know, the amazing, like, super bold, like, start up out of YC, you want to build, like, the next, like, billion, like, you know, user application to make it, like, truly your first and make it feel, like, you know, an actual good, like, you know, product experience. So that's essentially the ambition, and I think, like, yeah, it could be pretty big.[00:33:17] swyx: Yeah. I think one, one people, one issue that people have with the voice so far as, as released in advanced voice mode is the refusals.[00:33:24] Alex Volkov: Yeah.[00:33:24] swyx: You guys had a very inspiring model spec. I think Joanne worked on that. Where you said, like, yeah, we don't want to overly refuse all the time. In fact, like, even if, like, not safe for work, like, in some occasions, it's okay.[00:33:38] swyx: How, is there an API that we can say, not safe for work, okay?[00:33:41] Olivier Godement: I think we'll get there. I think we'll get there. The mobile spec, like, nailed it, like, you know. It nailed it! It's so good! Yeah, we are not in the business of, like, policing, you know, if you can say, like, vulgar words or whatever. You know, there are some use cases, like, you know, I'm writing, like, a Hollywood, like, script I want to say, like, will go on, and it's perfectly fine, you know?[00:33:59] Olivier Godement: And so I think the direction where we'll go here is that basically There will always be like, you know, a set of behavior that we will, you know, just like forbid, frankly, because they're illegal against our terms of services. But then there will be like, you know, some more like risky, like themes, which are completely legal, like, you know, vulgar words or, you know, not safe for work stuff.[00:34:17] Olivier Godement: Where basically we'll expose like a controllable, like safety, like knobs in the API to basically allow you to say, hey, that theme okay, that theme not okay. How sensitive do you want the threshold to be on safety refusals? I think that's the Dijkstra. So a[00:34:31] swyx: safety API.[00:34:32] Olivier Godement: Yeah, in a way, yeah.[00:34:33] swyx: Yeah, we've never had that.[00:34:34] Olivier Godement: Yeah. '[00:34:35] swyx: cause right now is you, it is whatever you decide. And then it's, that's it. That, that, that would be the main reason I don't use opening a voice is because of[00:34:42] Olivier Godement: it's over police. Over refuse over refusals. Yeah. Yeah, yeah. No, we gotta fix that. Yeah. Like singing,[00:34:47] Alessio: we're trying to do voice. I'm a singer.[00:34:49] swyx: And you, you locked off singing.[00:34:51] swyx: Yeah,[00:34:51] Alessio: yeah, yeah.[00:34:52] swyx: But I, I understand music gets you in trouble. Okay. Yeah. So then, and then just generally, like, what do you want to hear from developers? Right? We have, we have all developers watching you know, what feedback do you want? Any, anything specific as well, like from, especially from today anything that you are unsure about, that you are like, Our feedback could really help you decide.[00:35:09] swyx: For sure.[00:35:10] Olivier Godement: I think, essentially, it's becoming pretty clear after today that, you know, I would say the open end direction has become pretty clear, like, you know, after today. Investment in reasoning, investment in multimodality, Investment as well, like in, I would say, tool use, like function calling. To me, the biggest question I have is, you know, Where should we put the cursor next?[00:35:30] Olivier Godement: I think we need all three of them, frankly, like, you know, so we'll keep pushing.[00:35:33] swyx: Hire 10, 000 people, or actually, no need, build a bunch of bots.[00:35:37] Olivier Godement: Exactly, and so let's take O1 smart enough, like, for your problems? Like, you know, let's set aside for a second the existing models, like, for the apps that you would love to build, is O1 basically it in reasoning, or do we still have, like, you know, a step to do?[00:35:50] Olivier Godement: Preview is not enough, I[00:35:52] swyx: need the full one.[00:35:53] Olivier Godement: Yeah, so that's exactly that sort of feedback. Essentially what they would love to do is for developers I mean, there's a thing that Sam has been saying like over and over again, like, you know, it's easier said than done, but I think it's directionally correct. As a developer, as a founder, you basically want to build an app which is a bit too difficult for the model today, right?[00:36:12] Olivier Godement: Like, what you think is right, it's like, sort of working, sometimes not working. And that way, you know, that basically gives us like a goalpost, and be like, okay, that's what you need to enable with the next model release, like in a few months. And so I would say that Usually, like, that's the sort of feedback which is like the most useful that I can, like, directly, like, you know, incorporate.[00:36:33] swyx: Awesome. I think that's our time. Thank you so much, guys. Yeah, thank you so much.[00:36:38] AI Charlie: Thank you. We were particularly impressed that Olivier addressed the not safe for work moderation policy question head on, as that had only previously been picked up on in Reddit forums. This is an encouraging sign that we will return to in the closing candor with Sam Altman at the end of this episode.[00:36:57] Romain Huet, Head of DX, OpenAI[00:36:57] AI Charlie: Next, a chat with Roman Hewitt, friend of the pod, AI Engineer World's fair closing keynote speaker, and head of developer experience at OpenAI on his incredible live demos And advice to AI engineers on all the new modalities.[00:37:12] Alessio: Alright, we're live from OpenAI Dev Day. We're with Juan, who just did two great demos on, on stage.[00:37:17] Alessio: And he's been a friend of Latentspace, so thanks for taking some of the time.[00:37:20] Romain Huet: Of course, yeah, thank you for being here and spending the time with us today.[00:37:23] swyx: Yeah, I appreciate appreciate you guys putting this on. I, I know it's like extra work, but it really shows the developers that you're, Care and about reaching out.[00:37:31] Romain Huet: Yeah, of course, I think when you go back to the OpenAI mission, I think for us it's super important that we have the developers involved in everything we do. Making sure that you know, they have all of the tools they need to build successful apps. And we really believe that the developers are always going to invent the ideas, the prototypes, the fun factors of AI that we can't build ourselves.[00:37:49] Romain Huet: So it's really cool to have everyone here.[00:37:51] swyx: We had Michelle from you guys on. Yes, great episode. She very seriously said API is the path to AGI. Correct. And people in our YouTube comments were like, API is not AGI. I'm like, no, she's very serious. API is the path to AGI. Like, you're not going to build everything like the developers are, right?[00:38:08] swyx: Of[00:38:08] Romain Huet: course, yeah, that's the whole value of having a platform and an ecosystem of amazing builders who can, like, in turn, create all of these apps. I'm sure we talked about this before, but there's now more than 3 million developers building on OpenAI, so it's pretty exciting to see all of that energy into creating new things.[00:38:26] Alessio: I was going to say, you built two apps on stage today, an international space station tracker and then a drone. The hardest thing must have been opening Xcode and setting that up. Now, like, the models are so good that they can do everything else. Yes. You had two modes of interaction. You had kind of like a GPT app to get the plan with one, and then you had a cursor to do apply some of the changes.[00:38:47] Alessio: Correct. How should people think about the best way to consume the coding models, especially both for You know, brand new projects and then existing projects that you're trying to modify.[00:38:56] Romain Huet: Yeah. I mean, one of the things that's really cool about O1 Preview and O1 Mini being available in the API is that you can use it in your favorite tools like cursor like I did, right?[00:39:06] Romain Huet: And that's also what like Devin from Cognition can use in their own software engineering agents. In the case of Xcode, like, it's not quite deeply integrated in Xcode, so that's why I had like chat GPT side by side. But it's cool, right, because I could instruct O1 Preview to be, like, my coding partner and brainstorming partner for this app, but also consolidate all of the, the files and architect the app the way I wanted.[00:39:28] Romain Huet: So, all I had to do was just, like, port the code over to Xcode and zero shot the app build. I don't think I conveyed, by the way, how big a deal that is, but, like, you can now create an iPhone app from scratch, describing a lot of intricate details that you want, and your vision comes to life in, like, a minute.[00:39:47] Romain Huet: It's pretty outstanding.[00:39:48] swyx: I have to admit, I was a bit skeptical because if I open up SQL, I don't know anything about iOS programming. You know which file to paste it in. You probably set it up a little bit. So I'm like, I have to go home and test it. And I need the ChatGPT desktop app so that it can tell me where to click.[00:40:04] Romain Huet: Yeah, I mean like, Xcode and iOS development has become easier over the years since they introduced Swift and SwiftUI. I think back in the days of Objective C, or like, you know, the storyboard, it was a bit harder to get in for someone new. But now with Swift and SwiftUI, their dev tools are really exceptional.[00:40:23] Romain Huet: But now when you combine that with O1, as your brainstorming and coding partner, it's like your architect, effectively. That's the best way, I think, to describe O1. People ask me, like, can GPT 4 do some of that? And it certainly can. But I think it will just start spitting out code, right? And I think what's great about O1, is that it can, like, make up a plan.[00:40:42] Romain Huet: In this case, for instance, the iOS app had to fetch data from an API, it had to look at the docs, it had to look at, like, how do I parse this JSON, where do I store this thing, and kind of wire things up together. So that's where it really shines. Is mini or preview the better model that people should be using?[00:40:58] Romain Huet: Like, how? I think people should try both. We're obviously very excited about the upcoming O1 that we shared the evals for. But we noticed that O1 Mini is very, very good at everything math, coding, everything STEM. If you need for your kind of brainstorming or your kind of science part, you need some broader knowledge than reaching for O1 previews better.[00:41:20] Romain Huet: But yeah, I used O1 Mini for my second demo. And it worked perfectly. All I needed was very much like something rooted in code, architecting and wiring up like a front end, a backend, some UDP packets, some web sockets, something very specific. And it did that perfectly.[00:41:35] swyx: And then maybe just talking about voice and Wanderlust, the app that keeps on giving, what's the backstory behind like preparing for all of that?[00:41:44] Romain Huet: You know, it's funny because when last year for Dev Day, we were trying to think about what could be a great demo app to show like an assistive experience. I've always thought travel is a kind of a great use case because you have, like, pictures, you have locations, you have the need for translations, potentially.[00:42:01] Romain Huet: There's like so many use cases that are bounded to travel that I thought last year, let's use a travel app. And that's how Wanderlust came to be. But of course, a year ago, all we had was a text based assistant. And now we thought, well, if there's a voice modality, what if we just bring this app back as a wink.[00:42:19] Romain Huet: And what if we were interacting better with voice? And so with this new demo, what I showed was the ability to like, So, we wanted to have a complete conversation in real time with the app, but also the thing we wanted to highlight was the ability to call tools and functions, right? So, like in this case, we placed a phone call using the Twilio API, interfacing with our AI agents, but developers are so smart that they'll come up with so many great ideas that we could not think of ourselves, right?[00:42:48] Romain Huet: But what if you could have like a, you know, a 911 dispatcher? What if you could have like a customer service? Like center, that is much smarter than what we've been used to today. There's gonna be so many use cases for real time, it's awesome.[00:43:00] swyx: Yeah, and sometimes actually you, you, like this should kill phone trees.[00:43:04] swyx: Like there should not be like dial one[00:43:07] Romain Huet: of course para[00:43:08] swyx: espanol, you know? Yeah, exactly. Or whatever. I dunno.[00:43:12] Romain Huet: I mean, even you starting speaking Spanish would just do the thing, you know you don't even have to ask. So yeah, I'm excited for this future where we don't have to interact with those legacy systems.[00:43:22] swyx: Yeah. Yeah. Is there anything, so you are doing function calling in a streaming environment. So basically it's, it's web sockets. It's UDP, I think. It's basically not guaranteed to be exactly once delivery. Like, is there any coding challenges that you encountered when building this?[00:43:39] Romain Huet: Yeah, it's a bit more delicate to get into it.[00:43:41] Romain Huet: We also think that for now, what we, what we shipped is a, is a beta of this API. I think there's much more to build onto it. It does have the function calling and the tools. But we think that for instance, if you want to have something very robust, On your client side, maybe you want to have web RTC as a client, right?[00:43:58] Romain Huet: And, and as opposed to like directly working with the sockets at scale. So that's why we have partners like Life Kit and Agora if you want to, if you want to use them. And I'm sure we'll have many mores in the, in many more in the future. But yeah, we keep on iterating on that, and I'm sure the feedback of developers in the weeks to come is going to be super critical for us to get it right.[00:44:16] swyx: Yeah, I think LiveKit has been fairly public that they are used in, in the Chachapiti app. Like, is it, it's just all open source, and we just use it directly with OpenAI, or do we use LiveKit Cloud or something?[00:44:28] Romain Huet: So right now we, we released the API, we released some sample code also, and referenced clients for people to get started with our API.[00:44:35] Romain Huet: And we also partnered with LifeKit and Agora, so they also have their own, like ways to help you get started that plugs natively with the real time API. So depending on the use case, people can, can can decide what to use. If you're working on something that's completely client or if you're working on something on the server side, for the voice interaction, you may have different needs, so we want to support all of those.[00:44:55] Alessio: I know you gotta run. Is there anything that you want the AI engineering community to give feedback on specifically, like even down to like, you know, a specific API end point or like, what, what's like the thing that you want? Yeah. I[00:45:08] Romain Huet: mean, you know, if we take a step back, I think dev Day this year is all different from last year and, and in, in a few different ways.[00:45:15] Romain Huet: But one way is that we wanted to keep it intimate, even more intimate than last year. We wanted to make sure that the community is. Thank you very much for joining us on the Spotlight. That's why we have community talks and everything. And the takeaway here is like learning from the very best developers and AI engineers.[00:45:31] Romain Huet: And so, you know we want to learn from them. Most of what we shipped this morning, including things like prompt caching the ability to generate prompts quickly in the playground, or even things like vision fine tuning. These are all things that developers have been asking of us. And so, the takeaway I would, I would leave them with is to say like, Hey, the roadmap that we're working on is heavily influenced by them and their work.[00:45:53] Romain Huet: And so we love feedback From high feature requests, as you say, down to, like, very intricate details of an API endpoint, we love feedback, so yes that's, that's how we, that's how we build this API.[00:46:05] swyx: Yeah, I think the, the model distillation thing as well, it might be, like, the, the most boring, but, like, actually used a lot.[00:46:12] Romain Huet: True, yeah. And I think maybe the most unexpected, right, because I think if I, if I read Twitter correctly the past few days, a lot of people were expecting us. To shape the real time API for speech to speech. I don't think developers were expecting us to have more tools for distillation, and we really think that's gonna be a big deal, right?[00:46:30] Romain Huet: If you're building apps that have you know, you, you want high, like like low latency, low cost, but high performance, high quality on the use case distillation is gonna be amazing.[00:46:40] swyx: Yeah. I sat in the distillation session just now and they showed how they distilled from four oh to four mini and it was like only like a 2% hit in the performance and 50 next.[00:46:49] swyx: Yeah,[00:46:50] Romain Huet: I was there as well for the superhuman kind of use case inspired for an Ebola client. Yeah, this was really good. Cool man! so much for having me. Thanks again for being here today. It's always[00:47:00] AI Charlie: great to have you. As you might have picked up at the end of that chat, there were many sessions throughout the day focused on specific new capabilities.[00:47:08] Michelle Pokrass, Head of API at OpenAI ft. Simon Willison[00:47:08] AI Charlie: Like the new model distillation features combining EVOLs and fine tuning. For our next session, we are delighted to bring back two former guests of the pod, which is something listeners have been greatly enjoying in our second year of doing the Latent Space podcast. Michelle Pokras of the API team joined us recently to talk about structured outputs, and today gave an updated long form session at Dev Day, describing the implementation details of the new structured output mode.[00:47:39] AI Charlie: We also got her updated thoughts on the VoiceMode API we discussed in her episode, now that it is finally announced. She is joined by friend of the pod and super blogger, Simon Willison, who also came back as guest co host in our Dev Day. 2023 episode.[00:47:56] Alessio: Great, we're back live at Dev Day returning guest Michelle and then returning guest co host Fork.[00:48:03] Alessio: Fork, yeah, I don't know. I've lost count. I think it's been a few. Simon Willison is back. Yeah, we just wrapped, we just wrapped everything up. Congrats on, on getting everything everything live. Simon did a great, like, blog, so if you haven't caught up, I[00:48:17] Simon Willison: wrote my, I implemented it. Now, I'm starting my live blog while waiting for the first talk to start, using like GPT 4, I wrote me the Javascript, and I got that live just in time and then, yeah, I was live blogging the whole day.[00:48:28] swyx: Are you a cursor enjoyer?[00:48:29] Simon Willison: I haven't really gotten into cursor yet to be honest. I just haven't spent enough time for it to click, I think. I'm more a copy and paste things out of Cloud and chat GPT. Yeah. It's interesting.[00:48:39] swyx: Yeah. I've converted to cursor and 01 is so easy to just toggle on and off.[00:48:45] Alessio: What's your workflow?[00:48:46] Alessio: VS[00:48:48] Michelle Pokrass: Code co pilot, so Yep, same here. Team co pilot. Co pilot is actually the reason I joined OpenAI. It was, you know, before ChatGPT, this is the thing that really got me. So I'm still into it, but I keep meaning to try out Cursor, and I think now that things have calmed down, I'm gonna give it a real go.[00:49:03] swyx: Yeah, it's a big thing to change your tool of choice.[00:49:06] swyx: Yes,[00:49:06] Michelle Pokrass: yeah, I'm pretty dialed, so.[00:49:09] swyx: I mean, you know, if you want, you can just fork VS Code and make your own. That's the thing to dumb thing, right? We joked about doing a hackathon where the only thing you do is fork VS Code and bet me the best fork win.[00:49:20] Michelle Pokrass: Nice.[00:49:22] swyx: That's actually a really good idea. Yeah, what's up?[00:49:26] swyx: I mean, congrats on launching everything today. I know, like, we touched on it a little bit, but, like, everyone was kind of guessing that Voice API was coming, and, like, we talked about it in our episode. How do you feel going into the launch? Like, any design decisions that you want to highlight?[00:49:41] Michelle Pokrass: Yeah, super jazzed about it. The team has been working on it for a while. It's, like, a very different API for us. It's the first WebSocket API, so a lot of different design decisions to be made. It's, like, what kind of events do you send? When do you send an event? What are the event names? What do you send, like, on connection versus on future messages?[00:49:57] Michelle Pokrass: So there have been a lot of interesting decisions there. The team has also hacked together really cool projects as we've been testing it. One that I really liked is we had an internal hack a thon for the API team. And some folks built like a little hack that you could use to, like VIM with voice mode, so like, control vim, and you would tell them on like, nice, write a file and it would, you know, know all the vim commands and, and pipe those in.[00:50:18] Michelle Pokrass: So yeah, a lot of cool stuff we've been hacking on and really excited to see what people build with it.[00:50:23] Simon Willison: I've gotta call out a demo from today. I think it was Katja had a 3D visualization of the solar system, like WebGL solar system, you could talk to. That is one of the coolest conference demos I've ever seen.[00:50:33] Simon Willison: That was so convincing. I really want the code. I really want the code for that to get put out there. I'll talk[00:50:39] Michelle Pokrass: to the team. I think we can[00:50:40] Simon Willison: probably
In this special crossover episode of The Cognitive Revolution, Nathan shares an insightful conversation from the Latent.Space podcast. Swyx and Alessio interview Alistair Pullen of Cosine, creators of Genie, showcasing the cutting edge of AI automation in software engineering. Learn how Cosine achieves state-of-the-art results on the SWE-bench benchmark by implementing advanced AI techniques. This episode complements Nathan's recent discussion on AI Automation, demonstrating how far these practices can be pushed in real-world applications. Don't miss this opportunity to explore the future of AI-driven software development and its implications for businesses across industries. Check out the Latent.Space podcast here: https://www.latent.space Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: WorkOS: Building an enterprise-ready SaaS app? WorkOS has got you covered with easy-to-integrate APIs for SAML, SCIM, and more. Join top startups like Vercel, Perplexity, Jasper & Webflow in powering your app with WorkOS. Enjoy a free tier for up to 1M users! Start now at https://bit.ly/WorkOS-Turpentine-Network Weights & Biases Weave: Weights & Biases Weave is a lightweight AI developer toolkit designed to simplify your LLM app development. With Weave, you can trace and debug input, metadata and output with just 2 lines of code. Make real progress on your LLM development and visit the following link to get started with Weave today: https://wandb.me/cr 80,000 Hours: 80,000 Hours offers free one-on-one career advising for Cognitive Revolution listeners aiming to tackle global challenges, especially in AI. They connect high-potential individuals with experts, opportunities, and personalized career plans to maximize positive impact. Apply for a free call at https://80000hours.org/cognitiverevolution to accelerate your career and contribute to solving pressing AI-related issues. Omneky: Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ CHAPTERS: (00:00:00) About the Show (00:00:22) Sponsors: WorkOS (00:01:22) About the Episode (00:04:29) Alistair and Cosine intro (00:13:50) Building the Code Retrieval Tool (00:17:36) Sponsors: Weights & Biases Weave | 80,000 Hours (00:20:15) Developing Genie and Fine-tuning Process (00:27:41) Working with Customer Data (00:30:53) Code Retrieval Challenges and Solutions (00:36:39) Sponsors: Omneky (00:37:02) Planning and Reasoning in AI Models (00:45:55) Language Support and Generalization (00:49:46) Fine-tuning Experience with OpenAI (00:52:56) Synthetic Data and Self-improvement Loop (00:55:57) Benchmarking and SWE-bench Results (01:01:47) Future Plans for Genie (01:03:02) Industry Trends and Cursor's Success (01:05:23) Calls to Action and Ideal Customers (01:08:43) Outro
Today's guest, Nicholas Carlini, a research scientist at DeepMind, argues that we should be focusing more on what AI can do for us individually, rather than trying to have an answer for everyone."How I Use AI" - A Pragmatic ApproachCarlini's blog post "How I Use AI" went viral for good reason. Instead of giving a personal opinion about AI's potential, he simply laid out how he, as a security researcher, uses AI tools in his daily work. He divided it in 12 sections:* To make applications* As a tutor* To get started* To simplify code* For boring tasks* To automate tasks* As an API reference* As a search engine* To solve one-offs* To teach me* Solving solved problems* To fix errorsEach of the sections has specific examples, so we recommend going through it. It also includes all prompts used for it; in the "make applications" case, it's 30,000 words total!My personal takeaway is that the majority of the work AI can do successfully is what humans dislike doing. Writing boilerplate code, looking up docs, taking repetitive actions, etc. These are usually boring tasks with little creativity, but with a lot of structure. This is the strongest arguments as to why LLMs, especially for code, are more beneficial to senior employees: if you can get the boring stuff out of the way, there's a lot more value you can generate. This is less and less true as you go entry level jobs which are mostly boring and repetitive tasks. Nicholas argues both sides ~21:34 in the pod.A New Approach to LLM BenchmarksWe recently did a Benchmarks 201 episode, a follow up to our original Benchmarks 101, and some of the issues have stayed the same. Notably, there's a big discrepancy between what benchmarks like MMLU test, and what the models are used for. Carlini created his own domain-specific language for writing personalized LLM benchmarks. The idea is simple but powerful:* Take tasks you've actually needed AI for in the past.* Turn them into benchmark tests.* Use these to evaluate new models based on your specific needs.It can represent very complex tasks, from a single code generation to drawing a US flag using C:"Write hello world in python" >> LLMRun() >> PythonRun() >> SubstringEvaluator("hello world")"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> VisionLLMRun("What flag is shown in this image?") >> (SubstringEvaluator("United States") | SubstringEvaluator("USA")))This approach solves a few problems:* It measures what's actually useful to you, not abstract capabilities.* It's harder for model creators to "game" your specific benchmark, a problem that has plagued standardized tests.* It gives you a concrete way to decide if a new model is worth switching to, similar to how developers might run benchmarks before adopting a new library or framework.Carlini argues that if even a small percentage of AI users created personal benchmarks, we'd have a much better picture of model capabilities in practice.AI SecurityWhile much of the AI security discussion focuses on either jailbreaks or existential risks, Carlini's research targets the space in between. Some highlights from his recent work:* LAION 400M data poisoning: By buying expired domains referenced in the dataset, Carlini's team could inject arbitrary images into models trained on LAION 400M. You can read the paper "Poisoning Web-Scale Training Datasets is Practical", for all the details. This is a great example of expanding the scope beyond the model itself, and looking at the whole system and how ti can become vulnerable.* Stealing model weights: They demonstrated how to extract parts of production language models (like OpenAI's) through careful API queries. This research, "Extracting Training Data from Large Language Models", shows that even black-box access can leak sensitive information.* Extracting training data: In some cases, they found ways to make models regurgitate verbatim snippets from their training data. Him and Milad Nasr wrote a paper on this as well: Scalable Extraction of Training Data from (Production) Language Models. They also think this might be applicable to extracting RAG results from a generation.These aren't just theoretical attacks. They've led to real changes in how companies like OpenAI design their APIs and handle data. If you really miss logit_bias and logit results by token, you can blame Nicholas :)We had a ton of fun also chatting about things like Conway's Game of Life, how much data can fit in a piece of paper, and porting Doom to Javascript. Enjoy!Show Notes* How I Use AI* My Benchmark for LLMs* Doom Javascript port* Conway's Game of Life* Tic-Tac-Toe in one printf statement* International Obfuscated C Code Contest* Cursor* LAION 400M poisoning paper* Man vs Machine at Black Hat* Model Stealing from OpenAI* Milad Nasr* H.D. Moore* Vijay Bolina* Cosine.sh* uuencodeTimestamps* [00:00:00] Introductions* [00:01:14] Why Nicholas writes* [00:02:09] The Game of Life* [00:05:07] "How I Use AI" blog post origin story* [00:08:24] Do we need software engineering agents?* [00:11:03] Using AI to kickstart a project* [00:14:08] Ephemeral software* [00:17:37] Using AI to accelerate research* [00:21:34] Experts vs non-expert users as beneficiaries of AI* [00:24:02] Research on generating less secure code with LLMs.* [00:27:22] Learning and explaining code with AI* [00:30:12] AGI speculations?* [00:32:50] Distributing content without social media* [00:35:39] How much data do you think you can put on a single piece of paper?* [00:37:37] Building personal AI benchmarks* [00:43:04] Evolution of prompt engineering and its relevance* [00:46:06] Model vs task benchmarking* [00:52:14] Poisoning LAION 400M through expired domains* [00:55:38] Stealing OpenAI models from their API* [01:01:29] Data stealing and recovering training data from models* [01:03:30] Finding motivation in your workTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO-in-Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're in the in-person studio, which Alessio has gorgeously set up for us, with Nicholas Carlini. Welcome. Thank you. You're a research scientist at DeepMind. You work at the intersection of machine learning and computer security. You got your PhD from Berkeley in 2018, and also your BA from Berkeley as well. And mostly we're here to talk about your blogs, because you are so generous in just writing up what you know. Well, actually, why do you write?Nicholas [00:00:41]: Because I like, I feel like it's fun to share what you've done. I don't like writing, sufficiently didn't like writing, I almost didn't do a PhD, because I knew how much writing was involved in writing papers. I was terrible at writing when I was younger. I do like the remedial writing classes when I was in university, because I was really bad at it. So I don't actually enjoy, I still don't enjoy the act of writing. But I feel like it is useful to share what you're doing, and I like being able to talk about the things that I'm doing that I think are fun. And so I write because I think I want to have something to say, not because I enjoy the act of writing.Swyx [00:01:14]: But yeah. It's a tool for thought, as they often say. Is there any sort of backgrounds or thing that people should know about you as a person? Yeah.Nicholas [00:01:23]: So I tend to focus on, like you said, I do security work, I try to like attacking things and I want to do like high quality security research. And that's mostly what I spend my actual time trying to be productive members of society doing that. But then I get distracted by things, and I just like, you know, working on random fun projects. Like a Doom clone in JavaScript.Swyx [00:01:44]: Yes.Nicholas [00:01:45]: Like that. Or, you know, I've done a number of things that have absolutely no utility. But are fun things to have done. And so it's interesting to say, like, you should work on fun things that just are interesting, even if they're not useful in any real way. And so that's what I tend to put up there is after I have completed something I think is fun, or if I think it's sufficiently interesting, write something down there.Alessio [00:02:09]: Before we go into like AI, LLMs and whatnot, why are you obsessed with the game of life? So you built multiplexing circuits in the game of life, which is mind boggling. So where did that come from? And then how do you go from just clicking boxes on the UI web version to like building multiplexing circuits?Nicholas [00:02:29]: I like Turing completeness. The definition of Turing completeness is a computer that can run anything, essentially. And the game of life, Conway's game of life is a very simple cellular 2D automata where you have cells that are either on or off. And a cell becomes on if in the previous generation some configuration holds true and off otherwise. It turns out there's a proof that the game of life is Turing complete, that you can run any program in principle using Conway's game of life. I don't know. And so you can, therefore someone should. And so I wanted to do it. Some other people have done some similar things, but I got obsessed into like, if you're going to try and make it work, like we already know it's possible in theory. I want to try and like actually make something I can run on my computer, like a real computer I can run. And so yeah, I've been going on this rabbit hole of trying to make a CPU that I can run semi real time on the game of life. And I have been making some reasonable progress there. And yeah, but you know, Turing completeness is just like a very fun trap you can go down. A while ago, as part of a research paper, I was able to show that in C, if you call into printf, it's Turing complete. Like printf, you know, like, which like, you know, you can print numbers or whatever, right?Swyx [00:03:39]: Yeah, but there should be no like control flow stuff.Nicholas [00:03:42]: Because printf has a percent n specifier that lets you write an arbitrary amount of data to an arbitrary location. And the printf format specifier has an index into where it is in the loop that is in memory. So you can overwrite the location of where printf is currently indexing using percent n. So you can get loops, you can get conditionals, and you can get arbitrary data rates again. So we sort of have another Turing complete language using printf, which again, like this has essentially zero practical utility, but like, it's just, I feel like a lot of people get into programming because they enjoy the art of doing these things. And then they go work on developing some software application and lose all joy with the boys. And I want to still have joy in doing these things. And so on occasion, I try to stop doing productive, meaningful things and just like, what's a fun thing that we can do and try and make that happen.Alessio [00:04:39]: Awesome. So you've been kind of like a pioneer in the AI security space. You've done a lot of talks starting back in 2018. We'll kind of leave that to the end because I know the security part is, there's maybe a smaller audience, but it's a very intense audience. So I think that'll be fun. But everybody in our Discord started posting your how I use AI blog post and we were like, we should get Carlini on the podcast. And then you were so nice to just, yeah, and then I sent you an email and you're like, okay, I'll come.Swyx [00:05:07]: And I was like, oh, I thought that would be harder.Alessio [00:05:10]: I think there's, as you said in the blog posts, a lot of misunderstanding about what LLMs can actually be used for. What are they useful at? What are they not good at? And whether or not it's even worth arguing what they're not good at, because they're obviously not. So if you cannot count the R's in a word, they're like, it's just not what it does. So how painful was it to write such a long post, given that you just said that you don't like to write? Yeah. And then we can kind of run through the things, but maybe just talk about the motivation, why you thought it was important to do it.Nicholas [00:05:39]: Yeah. So I wanted to do this because I feel like most people who write about language models being good or bad, some underlying message of like, you know, they have their camp and their camp is like, AI is bad or AI is good or whatever. And they like, they spin whatever they're going to say according to their ideology. And they don't actually just look at what is true in the world. So I've read a lot of things where people say how amazing they are and how all programmers are going to be obsolete by 2024. And I've read a lot of things where people who say like, they can't do anything useful at all. And, you know, like, they're just like, it's only the people who've come off of, you know, blockchain crypto stuff and are here to like make another quick buck and move on. And I don't really agree with either of these. And I'm not someone who cares really one way or the other how these things go. And so I wanted to write something that just says like, look, like, let's sort of ground reality and what we can actually do with these things. Because my actual research is in like security and showing that these models have lots of problems. Like this is like my day to day job is saying like, we probably shouldn't be using these in lots of cases. I thought I could have a little bit of credibility of in saying, it is true. They have lots of problems. We maybe shouldn't be deploying them lots of situations. And still, they are also useful. And that is the like, the bit that I wanted to get across is to say, I'm not here to try and sell you on anything. I just think that they're useful for the kinds of work that I do. And hopefully, some people would listen. And it turned out that a lot more people liked it than I thought. But yeah, that was the motivation behind why I wanted to write this.Alessio [00:07:15]: So you had about a dozen sections of like how you actually use AI. Maybe we can just kind of run through them all. And then maybe the ones where you have extra commentary to add, we can... Sure.Nicholas [00:07:27]: Yeah, yeah. I didn't put as much thought into this as maybe was deserved. I probably spent, I don't know, definitely less than 10 hours putting this together.Swyx [00:07:38]: Wow.Alessio [00:07:39]: It took me close to that to do a podcast episode. So that's pretty impressive.Nicholas [00:07:43]: Yeah. I wrote it in one pass. I've gotten a number of emails of like, you got this editing thing wrong, you got this sort of other thing wrong. It's like, I haven't just haven't looked at it. I tend to try it. I feel like I still don't like writing. And so because of this, the way I tend to treat this is like, I will put it together into the best format that I can at a time, and then put it on the internet, and then never change it. And this is an aspect of like the research side of me is like, once a paper is published, like it is done as an artifact that exists in the world. I could forever edit the very first thing I ever put to make it the most perfect version of what it is, and I would do nothing else. And so I feel like I find it useful to be like, this is the artifact, I will spend some certain amount of hours on it, which is what I think it is worth. And then I will just...Swyx [00:08:22]: Yeah.Nicholas [00:08:23]: Timeboxing.Alessio [00:08:24]: Yeah. Stop. Yeah. Okay. We just recorded an episode with the founder of Cosine, which is like an AI software engineer colleague. You said it took you 30,000 words to get GPT-4 to build you the, can GPT-4 solve this kind of like app. Where are we in the spectrum where chat GPT is all you need to actually build something versus I need a full on agent that does everything for me?Nicholas [00:08:46]: Yeah. Okay. So this was an... So I built a web app last year sometime that was just like a fun demo where you can guess if you can predict whether or not GPT-4 at the time could solve a given task. This is, as far as web apps go, very straightforward. You need basic HTML, CSS, you have a little slider that moves, you have a button, sort of animate the text coming to the screen. The reason people are going here is not because they want to see my wonderful HTML, right? I used to know how to do modern HTML in 2007, 2008. I was very good at fighting with IE6 and these kinds of things. I knew how to do that. I have no longer had to build any web app stuff in the meantime, which means that I know how everything works, but I don't know any of the new... Flexbox is new to me. Flexbox is like 10 years old at this point, but it's just amazing being able to go to the model and just say, write me this thing and it will give me all of the boilerplate that I need to get going. Of course it's imperfect. It's not going to get you the right answer, and it doesn't do anything that's complicated right now, but it gets you to the point where the only remaining work that needs to be done is the interesting hard part for me, the actual novel part. Even the current models, I think, are entirely good enough at doing this kind of thing, that they're very useful. It may be the case that if you had something, like you were saying, a smarter agent that could debug problems by itself, that might be even more useful. Currently though, make a model into an agent by just copying and pasting error messages for the most part. That's what I do, is you run it and it gives you some code that doesn't work, and either I'll fix the code, or it will give me buggy code and I won't know how to fix it, and I'll just copy and paste the error message and say, it tells me this. What do I do? And it will just tell me how to fix it. You can't trust these things blindly, but I feel like most people on the internet already understand that things on the internet, you can't trust blindly. And so this is not like a big mental shift you have to go through to understand that it is possible to read something and find it useful, even if it is not completely perfect in its output.Swyx [00:10:54]: It's very human-like in that sense. It's the same ring of trust, I kind of think about it that way, if you had trust levels.Alessio [00:11:03]: And there's maybe a couple that tie together. So there was like, to make applications, and then there's to get started, which is a similar you know, kickstart, maybe like a project that you know the LLM cannot solve. It's kind of how you think about it.Nicholas [00:11:15]: Yeah. So for getting started on things is one of the cases where I think it's really great for some of these things, where I sort of use it as a personalized, help me use this technology I've never used before. So for example, I had never used Docker before January. I know what Docker is. Lucky you. Yeah, like I'm a computer security person, like I sort of, I have read lots of papers on, you know, all the technology behind how these things work. You know, I know all the exploits on them, I've done some of these things, but I had never actually used Docker. But I wanted it to be able to, I could run the outputs of language model stuff in some controlled contained environment, which I know is the right application. So I just ask it like, I want to use Docker to do this thing, like, tell me how to run a Python program in a Docker container. And it like gives me a thing. I'm like, step back. You said Docker compose, I do not know what this word Docker compose is. Is this Docker? Help me. And like, you'll sort of tell me all of these things. And I'm sure there's this knowledge that's out there on the internet, like this is not some groundbreaking thing that I'm doing, but I just wanted it as a small piece of one thing I was working on. And I didn't want to learn Docker from first principles. Like I, at some point, if I need it, I can do that. Like I have the background that I can make that happen. But what I wanted to do was, was thing one. And it's very easy to get bogged down in the details of this other thing that helps you accomplish your end goal. And I just want to like, tell me enough about Docker so I can do this particular thing. And I can check that it's doing the safe thing. I sort of know enough about that from, you know, my other background. And so I can just have the model help teach me exactly the one thing I want to know and nothing more. I don't need to worry about other things that the writer of this thinks is important that actually isn't. Like I can just like stop the conversation and say, no, boring to me. Explain this detail. I don't understand. I think that's what that was very useful for me. It would have taken me, you know, several hours to figure out some things that take 10 minutes if you could just ask exactly the question you want the answer to.Alessio [00:13:05]: Have you had any issues with like newer tools? Have you felt any meaningful kind of like a cutoff day where like there's not enough data on the internet or? I'm sure that the answer to this is yes.Nicholas [00:13:16]: But I tend to just not use most of these things. Like I feel like this is like the significant way in which I use machine learning models is probably very different than most people is that I'm a researcher and I get to pick what tools that I use and most of the things that I work on are fairly small projects. And so I can, I can entirely see how someone who is in a big giant company where they have their own proprietary legacy code base of a hundred million lines of code or whatever and like you just might not be able to use things the same way that I do. I still think there are lots of use cases there that are entirely reasonable that are not the same ones that I've put down. But I wanted to talk about what I have personal experience in being able to say is useful. And I would like it very much if someone who is in one of these environments would be able to describe the ways in which they find current models useful to them. And not, you know, philosophize on what someone else might be able to find useful, but actually say like, here are real things that I have done that I found useful for me.Swyx [00:14:08]: Yeah, this is what I often do to encourage people to write more, to share their experiences because they often fear being attacked on the internet. But you are the ultimate authority on how you use things and there's this objectively true. So they cannot be debated. One thing that people are very excited about is the concept of ephemeral software or like personal software. This use case in particular basically lowers the activation energy for creating software, which I like as a vision. I don't think I have taken as much advantage of it as I could. I feel guilty about that. But also, we're trending towards there.Nicholas [00:14:47]: Yeah. No, I mean, I do think that this is a direction that is exciting to me. One of the things I wrote that was like, a lot of the ways that I use these models are for one-off things that I just need to happen that I'm going to throw away in five minutes. And you can.Swyx [00:15:01]: Yeah, exactly.Nicholas [00:15:02]: Right. It's like the kind of thing where it would not have been worth it for me to have spent 45 minutes writing this, because I don't need the answer that badly. But if it will only take me five minutes, then I'll just figure it out, run the program and then get it right. And if it turns out that you ask the thing, it doesn't give you the right answer. Well, I didn't actually need the answer that badly in the first place. Like either I can decide to dedicate the 45 minutes or I cannot, but like the cost of doing it is fairly low. You see what the model can do. And if it can't, then, okay, when you're using these models, if you're getting the answer you want always, it means you're not asking them hard enough questions.Swyx [00:15:35]: Say more.Nicholas [00:15:37]: Lots of people only use them for very small particular use cases and like it always does the thing that they want. Yeah.Swyx [00:15:43]: Like they use it like a search engine.Nicholas [00:15:44]: Yeah. Or like one particular case. And if you're finding that when you're using these, it's always giving you the answer that you want, then probably it has more capabilities than you're actually using. And so I oftentimes try when I have something that I'm curious about to just feed into the model and be like, well, maybe it's just solved my problem for me. You know, most of the time it doesn't, but like on occasion, it's like, it's done things that would have taken me, you know, a couple hours that it's been great and just like solved everything immediately. And if it doesn't, then it's usually easier to verify whether or not the answer is correct than to have written in the first place. And so you check, you're like, well, that's just, you're entirely misguided. Nothing here is right. It's just like, I'm not going to do this. I'm going to go write it myself or whatever.Alessio [00:16:21]: Even for non-tech, I had to fix my irrigation system. I had an old irrigation system. I didn't know how I worked to program it. I took a photo, I sent it to Claude and it's like, oh yeah, that's like the RT 900. This is exactly, I was like, oh wow, you know, you know, a lot of stuff.Swyx [00:16:34]: Was it right?Alessio [00:16:35]: Yeah, it was right.Swyx [00:16:36]: It worked. Did you compare with OpenAI?Alessio [00:16:38]: No, I canceled my OpenAI subscription, so I'm a Claude boy. Do you have a way to think about this like one-offs software thing? One way I talk to people about it is like LLMs are kind of converging to like semantic serverless functions, you know, like you can say something and like it can run the function in a way and then that's it. It just kind of dies there. Do you have a mental model to just think about how long it should live for and like anything like that?Nicholas [00:17:02]: I don't think I have anything interesting to say here, no. I will take whatever tools are available in front of me and try and see if I can use them in meaningful ways. And if they're helpful, then great. If they're not, then fine. And like, you know, there are lots of people that I'm very excited about seeing all these people who are trying to make better applications that use these or all these kinds of things. And I think that's amazing. I would like to see more of it, but I do not spend my time thinking about how to make this any better.Alessio [00:17:27]: What's the most underrated thing in the list? I know there's like simplified code, solving boring tasks, or maybe is there something that you forgot to add that you want to throw in there?Nicholas [00:17:37]: I mean, so in the list, I only put things that people could look at and go, I understand how this solved my problem. I didn't want to put things where the model was very useful to me, but it would not be clear to someone else that it was actually useful. So for example, one of the things that I use it a lot for is debugging errors. But the errors that I have are very much not the errors that anyone else in the world will have. And in order to understand whether or not the solution was right, you just have to trust me on it. Because, you know, like I got my machine in a state that like CUDA was not talking to whatever some other thing, the versions were mismatched, something, something, something, and everything was broken. And like, I could figure it out with interaction with the model, and it gave it like told me the steps I needed to take. But at the end of the day, when you look at the conversation, you just have to trust me that it worked. And I didn't want to write things online that were this, like, you have to trust me that what I'm saying. I want everything that I said to like have evidence that like, here's the conversation, you can go and check whether or not this actually solved the task as I said that the model does. Because a lot of people I feel like say, I used a model to solve this very complicated task. And what they mean is the model did 10%, and I did the other 90% or something, I wanted everything to be verifiable. And so one of the biggest use cases for me, I didn't describe even at all, because it's not the kind of thing that other people could have verified by themselves. So that maybe is like, one of the things that I wish I maybe had said a little bit more about, and just stated that the way that this is done, because I feel like that this didn't come across quite as well. But yeah, of the things that I talked about, the thing that I think is most underrated is the ability of it to solve the uninteresting parts of problems for me right now, where people always say, this is one of the biggest arguments that I don't understand why people say is, the model can only do things that people have done before. Therefore, the model is not going to be helpful in doing new research or like discovering new things. And as someone whose day job is to do new things, like what is research? Research is doing something literally no one else in the world has ever done before. So this is what I do every single day, 90% of this is not doing something new, 90% of this is doing things a million people have done before, and then a little bit of something that was new. There's a reason why we say we stand on the shoulders of giants. It's true. Almost everything that I do is something that's been done many, many times before. And that is the piece that can be automated. Even if the thing that I'm doing as a whole is new, it is almost certainly the case that the small pieces that build up to it are not. And a number of people who use these models, I feel like expect that they can either solve the entire task or none of the task. But now I find myself very often, even when doing something very new and very hard, having models write the easy parts for me. And the reason I think this is so valuable, everyone who programs understands this, like you're currently trying to solve some problem and then you get distracted. And whatever the case may be, someone comes and talks to you, you have to go look up something online, whatever it is. You lose a lot of time to that. And one of the ways we currently don't think about being distracted is you're solving some hard problem and you realize you need a helper function that does X, where X is like, it's a known algorithm. Any person in the world, you say like, give me the algorithm that, have a dense graph or a sparse graph, I need to make it dense. You can do this by doing some matrix multiplies. It's like, this is a solved problem. I knew how to do this 15 years ago, but it distracts me from the problem I'm thinking about in my mind. I needed this done. And so instead of using my mental capacity and solving that problem and then coming back to the problem I was originally trying to solve, you could just ask model, please solve this problem for me. It gives you the answer. You run it. You can check that it works very, very quickly. And now you go back to solving the problem without having lost all the mental state. And I feel like this is one of the things that's been very useful for me.Swyx [00:21:34]: And in terms of this concept of expert users versus non-expert users, floors versus ceilings, you had some strong opinion here that like, basically it actually is more beneficial for non-experts.Nicholas [00:21:46]: Yeah, I don't know. I think it could go either way. Let me give you the argument for both of these. Yes. So I can only speak on the expert user behalf because I've been doing computers for a long time. And so yeah, the cases where it's useful for me are exactly these cases where I can check the output. I know, and anything the model could do, I could have done. I could have done better. I can check every single thing that the model is doing and make sure it's correct in every way. And so I can only speak and say, definitely it's been useful for me. But I also see a world in which this could be very useful for the kinds of people who do not have this knowledge, with caveats, because I'm not one of these people. I don't have this direct experience. But one of these big ways that I can see this is for things that you can check fairly easily, someone who could never have asked or have written a program themselves to do a certain task could just ask for the program that does the thing. And you know, some of the times it won't get it right. But some of the times it will, and they'll be able to have the thing in front of them that they just couldn't have done before. And we see a lot of people trying to do applications for this, like integrating language models into spreadsheets. Spreadsheets run the world. And there are some people who know how to do all the complicated spreadsheet equations and various things, and other people who don't, who just use the spreadsheet program but just manually do all of the things one by one by one by one. And this is a case where you could have a model that could try and give you a solution. And as long as the person is rigorous in testing that the solution does actually the correct thing, and this is the part that I'm worried about most, you know, I think depending on these systems in ways that we shouldn't, like this is what my research says, my research says is entirely on this, like, you probably shouldn't trust these models to do the things in adversarial situations, like, I understand this very deeply. And so I think that it's possible for people who don't have this knowledge to make use of these tools in ways, but I'm worried that it might end up in a world where people just blindly trust them, deploy them in situations that they probably shouldn't, and then someone like me gets to come along and just break everything because everything is terrible. And so I am very, very worried about that being the case, but I think if done carefully it is possible that these could be very useful.Swyx [00:23:54]: Yeah, there is some research out there that shows that when people use LLMs to generate code, they do generate less secure code.Nicholas [00:24:02]: Yeah, Dan Bonet has a nice paper on this. There are a bunch of papers that touch on exactly this.Swyx [00:24:07]: My slight issue is, you know, is there an agenda here?Nicholas [00:24:10]: I mean, okay, yeah, Dan Bonet, at least the one they have, like, I fully trust everything that sort of.Swyx [00:24:15]: Sorry, I don't know who Dan is.Swyx [00:24:17]: He's a professor at Stanford. Yeah, he and some students have some things on this. Yeah, there's a number. I agree that a lot of the stuff feels like people have an agenda behind it. There are some that don't, and I trust them to have done the right thing. I also think, even on this though, we have to be careful because the argument, whenever someone says x is true about language models, you should always append the suffix for current models because I'll be the first to admit I was one of the people who was very much on the opinion that these language models are fun toys and are going to have absolutely no practical utility. If you had asked me this, let's say, in 2020, I still would have said the same thing. After I had seen GPT-2, I had written a couple of papers studying GPT-2 very carefully. I still would have told you these things are toys. And when I first read the RLHF paper and the instruction tuning paper, I was like, nope, this is this thing that these weird AI people are doing. They're trying to make some analogies to people that makes no sense. It's just like, I don't even care to read it. I saw what it was about and just didn't even look at it. I was obviously wrong. These things can be useful. And I feel like a lot of people had the same mentality that I did and decided not to change their mind. And I feel like this is the thing that I want people to be careful about. I want them to at least know what is true about the world so that they can then see that maybe they should reconsider some of the opinions that they had from four or five years ago that may just not be true about today's models.Swyx [00:25:47]: Specifically because you brought up spreadsheets, I want to share my personal experience because I think Google has done a really good job that people don't know about, which is if you use Google Sheets, Gemini is integrated inside of Google Sheets and it helps you write formulas. Great.Nicholas [00:26:00]: That's news to me.Swyx [00:26:01]: Right? They don't maybe do a good job. Unless you watch Google I.O., there was no other opportunity to learn that Gemini is now in your Google Sheets. And so I just don't write formulas manually anymore. It just prompts Gemini to do it for me. And it does it.Nicholas [00:26:15]: One of the problems that these machine learning models have is a discoverability problem. I think this will be figured out. I mean, it's the same problem that you have with any assistant. You're given a blank box and you're like, what do I do with it? I think this is great. More of these things, it would be good for them to exist. I want them to exist in ways that we can actually make sure that they're done correctly. I don't want to just have them be pushed into more and more things just blindly. I feel like lots of people, there are far too many X plus AI, where X is like arbitrary thing in the world that has nothing to do with it and could not be benefited at all. And they're just doing it because they want to use the word. And I don't want that to happen.Swyx [00:26:58]: You don't want an AI fridge?Nicholas [00:27:00]: No. Yes. I do not want my fridge on the internet.Swyx [00:27:03]: I do not want... Okay.Nicholas [00:27:05]: Anyway, let's not go down that rabbit hole. I understand why some of that happens, because people want to sell things or whatever. But I feel like a lot of people see that and then they write off everything as a result of it. And I just want to say, there are allowed to be people who are trying to do things that don't make any sense. Just ignore them. Do the things that make sense.Alessio [00:27:22]: Another chunk of use cases was learning. So both explaining code, being an API reference, all of these different things. Any suggestions on how to go at it? I feel like one thing is generate code and then explain to me. One way is just tell me about this technology. Another thing is like, hey, I read this online, kind of help me understand it. Any best practices on getting the most out of it?Swyx [00:27:47]: Yeah.Nicholas [00:27:47]: I don't know if I have best practices. I have how I use them.Swyx [00:27:51]: Yeah.Nicholas [00:27:51]: I find it very useful for cases where I understand the underlying ideas, but I have never usedSwyx [00:27:59]: them in this way before.Nicholas [00:28:00]: I know what I'm looking for, but I just don't know how to get there. And so yeah, as an API reference is a great example. The tool everyone always picks on is like FFmpeg. No one in the world knows the command line arguments to do what they want. They're like, make the thing faster. I want lower bitrate, like dash V. Once you tell me what the answer is, I can check. This is one of these things where it's great for these kinds of things. Or in other cases, things where I don't really care that the answer is 100% correct. So for example, I do a lot of security work. Most of security work is reading some code you've never seen before and finding out which pieces of the code are actually important. Because, you know, most of the program isn't actually do anything to do with security. It has, you know, the display piece or the other piece or whatever. And like, you just, you would only ignore all of that. So one very fun use of models is to like, just have it describe all the functions and just skim it and be like, wait, which ones look like approximately the right things to look at? Because otherwise, what are you going to do? You're going to have to read them all manually. And when you're reading them manually, you're going to skim the function anyway, and not just figure out what's going on perfectly. Like you already know that when you're going to read these things, what you're going to try and do is figure out roughly what's going on. Then you'll delve into the details. This is a great way of just doing that, but faster, because it will abstract most of whatSwyx [00:29:21]: is right.Nicholas [00:29:21]: It's going to be wrong some of the time. I don't care.Swyx [00:29:23]: I would have been wrong too.Nicholas [00:29:24]: And as long as you treat it with this way, I think it's great. And so like one of the particular use cases I have in the thing is decompiling binaries, where oftentimes people will release a binary. They won't give you the source code. And you want to figure out how to attack it. And so one thing you could do is you could try and run some kind of decompiler. It turns out for the thing that I wanted, none existed. And so I spent too many hours doing it by hand. Before I first thought, why am I doing this? I should just check if the model could do it for me. And it turns out that it can. And it can turn the compiled source code, which is impossible for any human to understand, into the Python code that is entirely reasonable to understand. And it doesn't run. It has a bunch of problems. But it's so much nicer that it's immediately a win for me. I can just figure out approximately where I should be looking, and then spend all of my time doing that by hand. And again, you get a big win there.Swyx [00:30:12]: So I fully agree with all those use cases, especially for you as a security researcher and having to dive into multiple things. I imagine that's super helpful. I do think we want to move to your other blog post. But you ended your post with a little bit of a teaser about your next post and your speculations. What are you thinking about?Nicholas [00:30:34]: So I want to write something. And I will do that at some point when I have time, maybe after I'm done writing my current papers for ICLR or something, where I want to talk about some thoughts I have for where language models are going in the near-term future. The reason why I want to talk about this is because, again, I feel like the discussion tends to be people who are either very much AGI by 2027, orSwyx [00:30:55]: always five years away, or are going to make statements of the form,Nicholas [00:31:00]: you know, LLMs are the wrong path, and we should be abandoning this, and we should be doing something else instead. And again, I feel like people tend to look at this and see these two polarizing options and go, well, those obviously are both very far extremes. Like, how do I actually, like, what's a more nuanced take here? And so I have some opinions about this that I want to put down, just saying, you know, I have wide margins of error. I think you should too. If you would say there's a 0% chance that something, you know, the models will get very, very good in the next five years, you're probably wrong. If you're going to say there's a 100% chance that in the next five years, then you're probably wrong. And like, to be fair, most of the people, if you read behind the headlines, actually say something like this. But it's very hard to get clicks on the internet of like, some things may be good in the future. Like, everyone wants like, you know, a very, like, nothing is going to be good. This is entirely wrong. It's going to be amazing. You know, like, they want to see this. I want people who have negative reactions to these kinds of extreme views to be able to at least say, like, to tell them, there is something real here. It may not solve all of our problems, but it's probably going to get better. I don't know by how much. And that's basically what I want to say. And then at some point, I'll talk about the safety and security things as a result of this. Because the way in which security intersects with these things depends a lot in exactly how people use these tools. You know, if it turns out to be the case that these models get to be truly amazing and can solve, you know, tasks completely autonomously, that's a very different security world to be living in than if there's always a human in the loop. And the types of security questions I would want to ask would be very different. And so I think, you know, in some very large part, understanding what the future will look like a couple of years ahead of time is helpful for figuring out which problems, as a security person, I want to solve now. You mentioned getting clicks on the internet,Alessio [00:32:50]: but you don't even have, like, an ex-account or anything. How do you get people to read your stuff? What's your distribution strategy? Because this post was popping up everywhere. And then people on Twitter were like, Nicholas Garlini wrote this. Like, what's his handle? It's like, he doesn't have it. It's like, how did you find it? What's the story?Nicholas [00:33:07]: So I have an RSS feed and an email list. And that's it. I don't like most social media things. On principle, I feel like they have some harms. As a person, I have a problem when people say things that are wrong on the internet. And I would get nothing done if I would have a Twitter. I would spend all of my time correcting people and getting into fights. And so I feel like it is just useful for me for this not to be an option. I tend to just post things online. Yeah, it's a very good question. I don't know how people find it. I feel like for some things that I write, other people think it resonates with them. And then they put it on Twitter. And...Swyx [00:33:43]: Hacker News as well.Nicholas [00:33:44]: Sure, yeah. I am... Because my day job is doing research, I get no value for having this be picked up. There's no whatever. I don't need to be someone who has to have this other thing to give talks. And so I feel like I can just say what I want to say. And if people find it useful, then they'll share it widely. You know, this one went pretty wide. I wrote a thing, whatever, sometime late last year, about how to recover data off of an Apple profile drive from 1980. This probably got, I think, like 1000x less views than this. But I don't care. Like, that's not why I'm doing this. Like, this is the benefit of having a thing that I actually care about, which is my research. I would care much more if that didn't get seen. This is like a thing that I write because I have some thoughts that I just want to put down.Swyx [00:34:32]: Yeah. I think it's the long form thoughtfulness and authenticity that is sadly lacking sometimes in modern discourse that makes it attractive. And I think now you have a little bit of a brand of you are an independent thinker, writer, person, that people are tuned in to pay attention to whatever is next coming.Nicholas [00:34:52]: Yeah, I mean, this kind of worries me a little bit. I don't like whenever I have a popular thing that like, and then I write another thing, which is like entirely unrelated. Like, I don't, I don't... You should actually just throw people off right now.Swyx [00:35:01]: Exactly.Nicholas [00:35:02]: I'm trying to figure out, like, I need to put something else online. So, like, the last two or three things I've done in a row have been, like, actually, like, things that people should care about.Swyx [00:35:10]: Yes. So, I have a couple of things.Nicholas [00:35:11]: I'm trying to figure out which one do I put online to just, like, cull the list of people who have subscribed to my email.Swyx [00:35:16]: And so, like, tell them, like,Nicholas [00:35:16]: no, like, what you're here for is not informed, well-thought-through takes. Like, what you're here for is whatever I want to talk about. And if you're not up for that, then, like, you know, go away. Like, this is not what I want out of my personal website.Swyx [00:35:27]: So, like, here's, like, top 10 enemies or something.Alessio [00:35:30]: What's the next project you're going to work on that is completely unrelated to research LLMs? Or what games do you want to port into the browser next?Swyx [00:35:39]: Okay. Yeah.Nicholas [00:35:39]: So, maybe.Swyx [00:35:41]: Okay.Nicholas [00:35:41]: Here's a fun question. How much data do you think you can put on a single piece of paper?Swyx [00:35:47]: I mean, you can think about bits and atoms. Yeah.Nicholas [00:35:49]: No, like, normal printer. Like, I gave you an office printer. How much data can you put on a piece of paper?Alessio [00:35:54]: Can you re-decode it? So, like, you know, base 64A or whatever. Yeah, whatever you want.Nicholas [00:35:59]: Like, you get normal off-the-shelf printer, off-the-shelf scanner. How much data?Swyx [00:36:03]: I'll just throw out there. Like, 10 megabytes. That's enormous. I know.Nicholas [00:36:07]: Yeah, that's a lot.Swyx [00:36:10]: Really small fonts. That's my question.Nicholas [00:36:12]: So, I have a thing. It does about a megabyte.Swyx [00:36:14]: Yeah, okay.Nicholas [00:36:14]: There you go. I was off by an order of magnitude.Swyx [00:36:16]: Yeah, okay.Nicholas [00:36:16]: So, in particular, it's about 1.44 megabytes. A floppy disk.Swyx [00:36:21]: Yeah, exactly.Nicholas [00:36:21]: So, this is supposed to be the title at some point. It's a floppy disk.Swyx [00:36:24]: A paper is a floppy disk. Yeah.Nicholas [00:36:25]: So, this is a little hard because, you know. So, you can do the math and you get 8.5 by 11. You can print at 300 by 300 DPI. And this gives you 2 megabytes. And so, every single pixel, you need to be able to recover up to like 90 plus percent. Like, 95 percent. Like, 99 point something percent accuracy. In order to be able to actually decode this off the paper. This is one of the things that I'm considering. I need to get a couple more things working for this. Where, you know, again, I'm running into some random problems. But this is probably, this will be one thing that I'm going to talk about. There's this contest called the International Obfuscated C-Code Contest, which is amazing. People try and write the most obfuscated C code that they can. Which is great. And I have a submission for that whenever they open up the next one for it. And I'll write about that submission. I have a very fun gate level emulation of an old CPU that runs like fully precisely. And it's a fun kind of thing. Yeah.Swyx [00:37:20]: Interesting. Your comment about the piece of paper reminds me of when I was in college. And you would have like one cheat sheet that you could write. So, you have a formula, a theoretical limit for bits per inch. And, you know, that's how much I would squeeze in really, really small. Yeah, definitely.Nicholas [00:37:36]: Okay.Swyx [00:37:37]: We are also going to talk about your benchmarking. Because you released your own benchmark that got some attention, thanks to some friends on the internet. What's the story behind your own benchmark? Do you not trust the open source benchmarks? What's going on there?Nicholas [00:37:51]: Okay. Benchmarks tell you how well the model solves the task the benchmark is designed to solve. For a long time, models were not useful. And so, the benchmark that you tracked was just something someone came up with, because you need to track something. All of deep learning exists because people tried to make models classify digits and classify images into a thousand classes. There is no one in the world who cares specifically about the problem of distinguishing between 300 breeds of dog for an image that's 224 or 224 pixels. And yet, like, this is what drove a lot of progress. And people did this not because they cared about this problem, because they wanted to just measure progress in some way. And a lot of benchmarks are of this flavor. You want to construct a task that is hard, and we will measure progress on this benchmark, not because we care about the problem per se, but because we know that progress on this is in some way correlated with making better models. And this is fine when you don't want to actually use the models that you have. But when you want to actually make use of them, it's important to find benchmarks that track with whether or not they're useful to you. And the thing that I was finding is that there would be model after model after model that was being released that would find some benchmark that they could claim state-of-the-art on and then say, therefore, ours is the best. And that wouldn't be helpful to me to know whether or not I should then switch to it. So the argument that I tried to lay out in this post is that more people should make benchmarks that are tailored to them. And so what I did is I wrote a domain-specific language that anyone can write for and say, you can take tasks that you have wanted models to solve for you, and you can put them into your benchmark that's the thing that you care about. And then when a new model comes out, you benchmark the model on the things that you care about. And you know that you care about them because you've actually asked for those answers before. And if the model scores well, then you know that for the kinds of things that you have asked models for in the past, it can solve these things well for you. This has been useful for me because when another model comes out, I can run it. I can see, does this solve the kinds of things that I care about? And sometimes the answer is yes, and sometimes the answer is no. And then I can decide whether or not I want to use that model or not. I don't want to say that existing benchmarks are not useful. They're very good at measuring the thing that they're designed to measure. But in many cases, what that's designed to measure is not actually the thing that I want to use it for. And I expect that the way that I want to use it is different the way that you want to use it. And I would just like more people to have these things out there in the world. And the final reason for this is, it is very easy. If you want to make a model good at some benchmark, to make it good at that benchmark, you can find the distribution of data that you need and train the model to be good on the distribution of data. And then you have your model that can solve this benchmark well. And by having a benchmark that is not very popular, you can be relatively certain that no one has tried to optimize their model for your benchmark.Swyx [00:40:40]: And I would like this to be-Nicholas [00:40:40]: So publishing your benchmark is a little bit-Swyx [00:40:43]: Okay, sure.Nicholas [00:40:43]: Contextualized. So my hope in doing this was not that people would use mine as theirs. My hope in doing this was that- You should make yours. Yes, you should make your benchmark. And if, for example, there were even a very small fraction of people, 0.1% of people who made a benchmark that was useful for them, this would still be hundreds of new benchmarks that- not want to make one myself, but I might want to- I might know the kinds of work that I do is a little bit like this person, a little bit like that person. I'll go check how it is on their benchmarks. And I'll see, roughly, I'll get a good sense of what's going on. Because the alternative is people just do this vibes-based evaluation thing, where you interact with the model five times, and you see if it worked on the kinds of things that you just like your toy questions. But five questions is a very low bit output from whether or not it works for this thing. And if you could just automate running it 100 questions for you, it's a much better evaluation. So that's why I did this.Swyx [00:41:37]: Yeah, I like the idea of going through your chat history and actually pulling out real-life examples. I regret to say that I don't think my chat history is used as much these days, because I'm using Cursor, the native AI IDE. So your examples are all coding related. And the immediate question is, now that you've written the How I Use AI post, which is a little bit broader, are you able to translate all these things to evals? Are some things unevaluable?Nicholas [00:42:03]: Right. A number of things that I do are harder to evaluate. So this is the problem with a benchmark, is you need some way to check whether or not the output was correct. And so all of the kinds of things that I can put into the benchmark are the kinds of things that you can check. You can check more things than you might have thought would be possible if you do a little bit of work on the back end. So for example, all of the code that I have the model write, it runs the code and sees whether the answer is the correct answer. Or in some cases, it runs the code, feeds the output to another language model, and the language model judges was the output correct. And again, is using a language model to judge here perfect? No. But like, what's the alternative? The alternative is to not do it. And what I care about is just, is this thing broadly useful for the kinds of questions that I have? And so as long as the accuracy is better than roughly random, like, I'm okay with this. I've inspected the outputs of these, and like, they're almost always correct. If you ask the model to judge these things in the right way, they're very good at being able to tell this. And so, yeah, I probably think this is a useful thing for people to do.Alessio [00:43:04]: You complain about prompting and being lazy and how you do not want to tip your model and you do not want to murder a kitten just to get the right answer. How do you see the evolution of like prompt engineering? Even like 18 months ago, maybe, you know, it was kind of like really hot and people wanted to like build companies around it. Today, it's like the models are getting good. Do you think it's going to be less and less relevant going forward? Or what's the minimum valuable prompt? Yeah, I don't know.Nicholas [00:43:29]: I feel like a big part of making an agent is just like a fancy prompt that like, you know, calls back to the model again. I have no opinion. It seems like maybe it turns out that this is really important. Maybe it turns out that this isn't. I guess the only comment I was making here is just to say, oftentimes when I use a model and I find it's not useful, I talk to people who help make it. The answer they usually give me is like, you're using it wrong. Which like reminds me very much of like that you're holding it wrong from like the iPhone kind of thing, right? Like, you know, like I don't care that I'm holding it wrong. I'm holding it that way. If the thing is not working with me, then like it's not useful for me. Like it may be the case that there exists a way to ask the model such that it gives me the answer that's correct, but that's not the way I'm doing it. If I have to spend so much time thinking about how I want to frame the question, that it would have been faster for me just to get the answer. It didn't save me any time. And so oftentimes, you know, what I do is like, I just dump in whatever current thought that I have in whatever ill-formed way it is. And I expect the answer to be correct. And if the answer is not correct, like in some sense, maybe the model was right to give me the wrong answer. Like I may have asked the wrong question, but I want the right answer still. And so like, I just want to sort of get this as a thing. And maybe the way to fix this is you have some default prompt that always goes into all the models or something, or you do something like clever like this. It would be great if someone had a way to package this up and make a thing I think that's entirely reasonable. Maybe it turns out that as models get better, you don't need to prompt them as much in this way. I just want to use the things that are in front of me.Alessio [00:44:55]: Do you think that's like a limitation of just how models work? Like, you know, at the end of the day, you're using the prompt to kind of like steer it in the latent space. Like, do you think there's a way to actually not make the prompt really relevant and have the model figure it out? Or like, what's the... I mean, you could fine tune itNicholas [00:45:10]: into the model, for example, that like it's supposed to... I mean, it seems like some models have done this, for example, like some recent model, many recent models. If you ask them a question, computing an integral of this thing, they'll say, let's think through this step by step. And then they'll go through the step by step answer. I didn't tell it. Two years ago, I would have had to have prompted it. Think step by step on solving the following thing. Now you ask them the question and the model says, here's how I'm going to do it. I'm going to take the following approach and then like sort of self-prompt itself.Swyx [00:45:34]: Is this the right way?Nicholas [00:45:35]: Seems reasonable. Maybe you don't have to do it. I don't know. This is for the people whose job is to make these things better. And yeah, I just want to use these things. Yeah.Swyx [00:45:43]: For listeners, that would be Orca and Agent Instruct. It's the soda on this stuff. Great. Yeah.Alessio [00:45:49]: That's a few shot. It's included in the lazy prompting. Like, do you do a few shot prompting? Like, do you collect some examples when you want to put them in? Or...Nicholas [00:45:57]: I don't because usually when I want the answer, I just want to get the answer. Brutal.Swyx [00:46:03]: This is hard mode. Yeah, exactly.Nicholas [00:46:04]: But this is fine.Swyx [00:46:06]: I want to be clear.Nicholas [00:46:06]: There's a difference between testing the ultimate capability level of the model and testing the thing that I'm doing with it. What I'm doing is I'm not exercising its full capability level because there are almost certainly better ways to ask the questions and sort of really see how good the model is. And if you're evaluating a model for being state of the art, this is ultimately what I care about. And so I'm entirely fine with people doing fancy prompting to show me what the true capability level could be because it's really useful to know what the ultimate level of the model could be. But I think it's also important just to have available to you how good the model is if you don't do fancy things.Swyx [00:46:39]: Yeah, I would say that here's a divergence between how models are marketed these days versus how people use it, which is when they test MMLU, they'll do like five shots, 25 shots, 50 shots. And no one's providing 50 examples. I completely agree.Nicholas [00:46:54]: You know, for these numbers, the problem is everyone wants to get state of the art on the benchmark. And so you find the way that you can ask the model the questions so that you get state of the art on the benchmark. And it's good. It's legitimately good to know. It's good to know the model can do this thing if only you try hard enough. Because it means that if I have some task that I want to be solved, I know what the capability level is. And I could get there if I was willing to work hard enough. And the question then is, should I work harder and figure out how to ask the model the question? Or do I just do the thing myself? And for me, I have programmed for many, many, many years. It's often just faster for me just to do the thing than to figure out the incantation to ask the model. But I can imagine someone who has never programmed before might be fine writing five paragraphs in English describing exactly the thing that they want and have the model build it for them if the alternative is not. But again, this goes to all these questions of how are they going to validate? Should they be trusting the output? These kinds of things.Swyx [00:47:49]: One problem with your eval paradigm and most eval paradigms, I'm not picking on you, is that we're actually training these things for chat, for interactive back and forth. And you actually obviously reveal much more information in the same way that asking 20 questions reveals more information in sort of a tree search branching sort of way. Then this is also by the way the problem with LMSYS arena, right? Where the vast majority of prompts are single question, single answer, eval, done. But actually the way that we use chat things, in the way, even in the stuff that you posted in your how I use AI stuff, you have maybe 20 turns of back and forth. How do you eval that?Nicholas [00:48:25]: Yeah. Okay. Very good question. This is the thing that I think many people should be doing more of. I would like more multi-turn evals. I might be writing a paper on this at some point if I get around to it. A couple of the evals in the benchmark thing I have are already multi-turn. I mentioned 20 questions. I have a 20 question eval there just for fun. But I have a couple others that are like, I just tell the model, here's my get thing, figure out how to cherry pick off this other branch and move it over there. And so what I do is I just, I basically build a tiny little agency thing. I just ask the model how I do it. I run the thing on Linux. This is what I want a Docker for. I spin up a Docker container. I run whatever the model told me the output to do is. I feed the output back into the model. I repeat this many rounds. And then I check at the very end, does the git commit history show that it is correctly cherry picked in
Betteridge's law says no: with seemingly infinite flavors of RAG, and >2million token context + prompt caching from Anthropic/Deepmind/Deepseek, it's reasonable to believe that "in context learning is all you need".But then there's Cosine Genie, the first to make a huge bet using OpenAI's new GPT4o fine-tuning for code at the largest scale it has ever been used externally; resulting in what is now the #1 coding agent in the world according to SWE-Bench Full, Lite, and Verified:SWE-Bench has been the most successful agent benchmark of the year, receiving honors at ICLR (our interview here) and recently being verified by OpenAI. Cognition (Devin) was valued at $2b after reaching 14% on it. So it is very, very big news when a new agent appears to beat all other solutions, by a lot:While this number is self reported, it seems to be corroborated by OpenAI, who also award it clear highest marks on SWE-Bench verified:The secret is GPT-4o finetuning on billions of tokens of synthetic data. * Finetuning: As OpenAI says:Genie is powered by a fine-tuned GPT-4o model trained on examples of real software engineers at work, enabling the model to learn to respond in a specific way. The model was also trained to be able to output in specific formats, such as patches that could be committed easily to codebases. Due to the scale of Cosine's finetuning, OpenAI worked closely with them to figure out the size of the LoRA:“They have to decide how big your LoRA adapter is going to be… because if you had a really sparse, large adapter, you're not going to get any signal in that at all. So they have to dynamically size these things.”* Synthetic data: we need to finetune on the process of making code work instead of only training on working code.“…we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.”Genie also has a 4 stage workflow with the standard LLM OS tooling stack that lets it solve problems iteratively:Full Video Podlike and subscribe etc!Show Notes* Alistair Pullen - Twitter, Linkedin* Cosine Genie launch, technical report* OpenAI GPT-4o finetuning GA* Llama 3 backtranslation* Cursor episode and Aman + SWEBench at ICLR episodeTimestamps* [00:00:00] Suno Intro* [00:05:01] Alistair and Cosine intro* [00:16:34] GPT4o finetuning* [00:20:18] Genie Data Mix* [00:23:09] Customizing for Customers* [00:25:37] Genie Workflow* [00:27:41] Code Retrieval* [00:35:20] Planning* [00:42:29] Language Mix* [00:43:46] Running Code* [00:46:19] Finetuning with OpenAI* [00:49:32] Synthetic Code Data* [00:51:54] SynData in Llama 3* [00:52:33] SWE-Bench Submission Process* [00:58:20] Future Plans* [00:59:36] Ecosystem Trends* [01:00:55] Founder Lessons* [01:01:58] CTA: Hiring & CustomersDescript Transcript[00:01:52] AI Charlie: Welcome back. This is Charlie, your AI cohost. As AI engineers, we have a special focus on coding agents, fine tuning, and synthetic data. And this week, it all comes together with the launch of Cosign's Genie, which reached 50 percent on SWE Bench Lite, 30 percent on the full SWE Bench, and 44 percent on OpenAI's new SWE Bench Verified.[00:02:17] All state of the art results by the widest ever margin recorded compared to former leaders Amazon Q and US Autocode Rover. And Factory Code Droid. As a reminder, Cognition Devon went viral with a 14 percent score just five months ago. Cosign did this by working closely with OpenAI to fine tune GPT 4. 0, now generally available to you and me, on billions of tokens of code, much of which was synthetically generated.[00:02:47] Alistair Pullen: Hi, I'm Ali. Co founder and CEO of Cosign, a human reasoning lab. And I'd like to show you Genie, our state of the art, fully autonomous software engineering colleague. Genie has the highest score on SWBench in the world. And the way we achieved this was by taking a completely different approach. We believe that if you want a model to behave like a software engineer, it has to be shown how a human software engineer works.[00:03:15] We've designed new techniques to derive human reasoning from real examples of software engineers doing their jobs. Our data represents perfect information lineage, incremental knowledge discovery, and step by step decision making. Representing everything a human engineer does logically. By actually training Genie on this unique dataset, rather than simply prompting base models, which is what everyone else is doing, we've seen that we're no longer simply generating random code until some works.[00:03:46] It's tackling problems like[00:03:48] AI Charlie: a human. Alistair Pullen is CEO and co founder of Kozen, and we managed to snag him on a brief trip stateside for a special conversation on building the world's current number one coding agent. Watch out and take care.[00:04:07] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Resonance at Decibel Partners, and I'm joined by my co host Swyx, founder of Small. ai.[00:04:16] swyx: Hey, and today we're back in the studio. In person, after about three to four months in visa jail and travels and all other fun stuff that we talked about in the previous episode.[00:04:27] But today we have a special guest, Ali Pullen from Cosign. Welcome. Hi, thanks for having me. We're very lucky to have you because you're on a two day trip to San Francisco. Yeah, I wouldn't recommend it. I would not[00:04:38] Alistair Pullen: recommend it. Don't fly from London to San Francisco for two days.[00:04:40] swyx: And you launched Genie on a plane.[00:04:42] On plain Wi Fi, um, claiming state of the art in SuiteBench, which we're all going to talk about. I'm excited to dive into your whole journey, because it has been a journey. I've been lucky to be a small angel in part of that journey. And it's exciting to see that you're launching to such acclaim and, you know, such results.[00:05:01] Alistair and Cosine intro[00:05:01] swyx: Um, so I'll go over your brief background, and then you can sort of fill in the blanks on what else people should know about you. You did your bachelor's in computer science at Exeter.[00:05:10] Speaker 6: Yep.[00:05:10] swyx: And then you worked at a startup that got acquired into GoPuff and round about 2022, you started working on a stealth startup that became a YC startup.[00:05:19] What's that? Yeah. So[00:05:21] Alistair Pullen: basically when I left university, I, I met my now co founder, Sam. At the time we were both mobile devs. He was an Android developer. iOS developer. And whilst at university, we built this sort of small consultancy, sort of, we'd um, be approached to build projects for people and we would just take them up and start with, they were student projects.[00:05:41] They weren't, they weren't anything crazy or anything big. We started with those and over time we started doing larger and larger projects, more interesting things. And then actually, when we left university, we just kept doing that. We didn't really get jobs, traditional jobs. It was also like in the middle of COVID, middle of lockdown.[00:05:57] So we were like, this is a pretty good gig. We'll just keep like writing code in our bedrooms. And yeah, that's it. We did that for a while. And then a friend of ours that we went to Exeter with started a YC startup during COVID. And it was one of these fast grocery delivery companies. At the time I was living in the deepest, darkest countryside in England, where fast grocery companies are still not a thing.[00:06:20] So he, he sort of pitched me this idea and was like, listen, like I need an iOS dev, do you fancy coming along? And I thought, absolutely. It was a chance to get out of my parents house, chance to move to London, you know, do interesting things. And at the time, truthfully, I had no idea what YC was. I had no idea.[00:06:34] I wasn't in the startup space. I knew I liked coding and building apps and stuff, but I'd never, never really done anything in that area. So I said, yes, absolutely. I moved to London just sort of as COVID was ending and yeah, worked at what was fancy for about a year and a half. Then we brought Sam along as well.[00:06:52] So we, Sam and I, were the two engineers at Fancy for basically its entire life, and we built literally everything. So like the, the front, the client mobile apps, the, the backends, the internal like stock management system, the driver routing, algorithms, all those things. Literally like everything. It was my first.[00:07:12] You know, both of us were super inexperienced. We didn't have, like, proper engineering experience. There were definitely decisions we'd do differently now. We'd definitely buy a lot of stuff off the shelf, stuff like that. But it was the initial dip of the toe into, like, the world of startups, and we were both, like, hooked immediately.[00:07:26] We were like, this is so cool. This sounds so much better than all our friends who were, like, consultants and doing, like, normal jobs, right? We did that, and it ran its course, and after, I want to say, 18 months or so, GoPuff came and acquired us. And there was obviously a transitionary period, an integration period, like with all acquisitions, and we did that, and as soon as we'd vested what we wanted to vest, and as soon as we thought, okay, this chapter is sort of done, uh, in about 2022, We left and we knew that we wanted to go alone and try something like we'd had this taste.[00:07:54] Now we knew we'd seen how a like a YC startup was managed like up close and we knew that we wanted to do something similar ourselves. We had no idea what it was at the time. We just knew we wanted to do something. So we, we tried a small, um, some small projects in various different areas, but then GPT 3.[00:08:12] He'd seen it on Reddit and I'm his source of all knowledge. Yeah, Sam loves Reddit. I'd actually heard of GPT 2. And obviously had like loosely followed what OpenAI had done with, what was the game they trained a model to play? Dota. Was it Dota? Yeah. So I'd followed that and, I knew loosely what GPT 2 was, I knew what BERT was, so I was like, Okay, this GPT 3 thing sounds interesting.[00:08:35] And he just mentioned it to me on a walk. And I then went home and, like, googled GPT was the playground. And the model was DaVinci 2 at the time. And it was just the old school playground, completions, nothing crazy, no chat, no nothing. I miss completions though. Yeah. Oh, completion. Honestly, I had this conversation in open hours office yesterday.[00:08:54] I was like, I just went. I know. But yeah, so we, we, um, I started playing around with the, the playground and the first thing I ever wrote into it was like, hello world, and it gave me some sort of like, fairly generic response back. I was like, okay, that looks pretty cool. The next thing was. I looked through the docs, um, also they had a lot of example prompts because I had no idea.[00:09:14] I didn't know if the, if you could put anything in, I didn't know if you had to structure in a certain way or whatever, and I, and I saw that it could start writing like tables and JSON and stuff like that. So I was like, okay, can you write me something in JSON? And it did. And I was like, Oh, wow, this is, this is pretty cool.[00:09:28] Um, can it, can it just write arbitrary JSON for me? And, um, immediately as soon as I realized that my mind was racing and I like got Sam in and we just started messing around in the playground, like fairly innocently to start with. And then, of course, both being mobile devs and also seeing, at that point, we learned about what the Codex model was.[00:09:48] It was like, this thing's trained to write code, sounds awesome. And Copilot was start, I think, I can't actually remember if Copilot had come out yet, it might have done. It's round about the same time as Codex. Round about the same time, yeah. And we were like, okay, as mobile devs, let's see what we can do.[00:10:02] So the initial thing was like, okay, let's see if we can get this AI to build us a mobile app from scratch. We eventually built the world's most flimsy system, which was back in the day with like 4, 000 token context windows, like chaining prompts, trying to keep as much context from one to the other, all these different things, where basically, Essentially, you'd put an app idea in a box, and then we'd do, like, very high level stuff, figuring out what the stack should be, figuring out what the frontend should be written in, backend should be written in, all these different things, and then we'd go through, like, for each thing, more and more levels of detail, until the point that you're You actually got Codex to write the code for each thing.[00:10:41] And we didn't do any templating or anything. We were like, no, we're going to write all the code from scratch every time, which is basically why it barely worked. But there were like occasions where you could put in something and it would build something that did actually run. The backend would run, the database would work.[00:10:54] And we were like, Oh my God, this is insane. This is so cool. And that's what we showed to our co founder Yang. I met my co founder Yang through, through fancy because his wife was their first employee. And, um, we showed him and he was like, You've discovered fire. What is this? This is insane. He has a lot more startup experience.[00:11:12] Historically, he's had a few exits in the past and has been through all different industries. He's like our dad. He's a bit older. He hates me saying that. He's your COO now? He's our COO. Yeah. And, uh, we showed him and he was like, this is absolutely amazing. Let's just do something. Cause he, he, at the time, um, was just about to have a child, so he didn't have anything going on either.[00:11:29] So we, we applied to YC, got an interview. The interview was. As most YC interviews are short, curt, and pretty brutal. They told us they hated the idea. They didn't think it would work. And that's when we started brainstorming. It was almost like the interview was like an office hours kind of thing. And we were like, okay, given what you know about the space now and how to build things with these LLMs, like what can you bring out of what you've learned in building that thing into Something that might be a bit more useful to people on the daily, and also YC obviously likes B2B startups a little bit more, at least at the time they did, back then.[00:12:01] So we were like, okay, maybe we could build something that helps you with existing codebases, like can sort of automate development stuff with existing codebases, not knowing at all what that would look like, or how you would build it, or any of these things. And They were like, yeah, that sounds interesting.[00:12:15] You should probably go ahead and do that. You're in, you've got two weeks to build us an MVP. And we were like, okay, okay. We did our best. The MVP was absolutely horrendous. It was a CLI tool. It sucked. And, um, at the time we were like, we, we don't even know. How to build what we want to build. And we didn't really know what we wanted to build, to be honest.[00:12:33] Like, we knew we wanted to try to help automate dev work, but back then we just didn't know enough about how LLM apps were built, the intricacies and all those things. And also, like, the LLMs themselves, like 4, 000 tokens, you're not going very far, they're extremely expensive. So we ended up building a, uh, a code based retrieval tool, originally.[00:12:51] Our thought process originally was, we want to build something that can do our jobs for us. That is like the gold star, we know that. We've seen like there are glimpses of it happening with our initial demo that we did. But we don't see the path of how to do that at the moment. Like the tech just wasn't there.[00:13:05] So we were like, well, there are going to be some things that you need to build this when the tech does catch up. So retrieval being one of the most important things, like the model is going to have to build like pull code out of a code base somehow. So we were like, well, let's just build the tooling around it.[00:13:17] And eventually when the tech comes, then we'll be able to just like plug it into our, our tooling and then it should work basically. And to be fair, that's basically what we've done. And that's basically what's happened, which is very fortunate. But in the meantime, whilst we were waiting for everything to sort of become available, we built this code base retrieval tool.[00:13:34] That was the first thing we ever launched when we were in YC like that, and it didn't work. It was really frustrating for us because it was just me and Sam like working like all hours trying to get this thing to work. It was quite a big task in of itself, trying to get like a good semantic search engine working that could run locally on your machine.[00:13:51] We were trying to avoid sending code to the cloud as much as possible. And then for very large codebases, you're like, you know, millions of lines of code. You're trying to do some sort of like local HNSW thing that runs inside your VS Code instance that like eats all your RAM as you've seen in the past.[00:14:05] All those different things. Yep. Yeah.[00:14:07] swyx: My first call with[00:14:07] Alistair Pullen: you, I had trouble. You were like, yeah, it sucks, man. I know, I know. I know it sucks. I'm sorry. I'm sorry. But building all that stuff was essentially the first six to eight months of what at the time was built. Which, by the way, build it. Build it. Yeah, it was a terrible, terrible name.[00:14:25] It was the worst,[00:14:27] swyx: like, part of trying to think about whether I would invest is whether or not people could pronounce it.[00:14:32] Alistair Pullen: No, when we, so when we went on our first ever YC, like, retreat, No one got the name right. They were like, build, build, well, um, and then we actually changed the names, cosign, like, although some people would spell it as in like, as if you're cosigning for an apartment or something like that's like, can't win.[00:14:49] Yeah. That was what built was back then. But the ambition, and I did a talk on this back in the end of 2022, the ambition to like build something that essentially automated our jobs was still very much like core to what we were doing. But for a very long time, it was just never apparent to us. Like. How would you go about doing these things?[00:15:06] Even when, like, you had 3. suddenly felt huge, because you've gone from 4 to 16, but even then 16k is like, a lot of Python files are longer than 16k. So you can't, you know, before you even start doing a completion, even then we were like, eh, Yeah, it looks like we're still waiting. And then, like, towards the end of last year, you then start, you see 32k.[00:15:28] 32k was really smart. It was really expensive, but also, like, you could fit a decent amount of stuff in it. 32k felt enormous. And then, finally, 128k came along, and we were like, right, this is, like, this is what we can actually deal with. Because, fundamentally, to build a product like this, you need to get as much information in front of the model as possible, and make sure that everything it ever writes in output can be read.[00:15:49] traced back to something in the context window, so it's not hallucinating it. As soon as that model existed, I was like, okay, I know that this is now going to be feasible in some way. We'd done early sort of dev work on Genie using 3. 5 16k. And that was a very, very like crude way of proving that this loop that we were after and the way we were generating the data actually had signal and worked and could do something.[00:16:16] But the model itself was not useful because you couldn't ever fit enough information into it for it to be able to do the task competently and also the base intelligence of the model. I mean, 3. 5, anyone who's used 3. 5 knows the base intelligence of the model is. is lacking, especially when you're asking it to like do software engineering, this is quite quite involved.[00:16:34] GPT4o finetuning[00:16:34] Alistair Pullen: So, we saw the 128k context model and um, at that point we'd been in touch with OpenAI about our ambitions and like how we wanted to build it. We essentially are, I just took a punt, I was like, I'm just going to ask to see, can we like train this thing? Because at the time Fortobo had just come out and back then there was still a decent amount of lag time between like OpenAI releasing a model and then allowing you to fine tune it in some way.[00:16:59] They've gotten much better about that recently, like 4. 0 fine tuning came out either, I think, a day, 4. 0 mini fine tuning came out like a day after the model did. And I know that's something they're definitely like, optimising for super heavily inside, which is great to see.[00:17:11] swyx: Which is a little bit, you know, for a year or so, YC companies had like a direct Slack channel to open AI.[00:17:17] We still do. Yeah. Yeah. So, it's a little bit of a diminishing of the YC advantage there. Yeah. If they're releasing this fine tuning[00:17:23] Alistair Pullen: ability like a day after. Yeah, no, no, absolutely. But like. You can't build a startup otherwise. The advantage is obviously nice and it makes you feel fuzzy inside. But like, at the end of the day, it's not that that's going to make you win.[00:17:34] But yeah, no, so like we'd spoken to Shamul there, Devrel guy, I'm sure you know him. I think he's head of solutions or something. In their applied team, yeah, we'd been talking to him from the very beginning when we got into YC, and he's been absolutely fantastic throughout. I basically had pitched him this idea back when we were doing it on 3.[00:17:53] 5, 16k, and I was like, this is my, this is my crazy thesis. I want to see if this can work. And as soon as like that 128k model came out, I started like laying the groundwork. I was like, I know this definitely isn't possible because he released it like yesterday, but know that I want it. And in the interim, like, GPT 4, like, 8K fine tuning came out.[00:18:11] We tried that, it's obviously even fewer tokens, but the intelligence helped. And I was like, if we can marry the intelligence and the context window length, then we're going to have something special. And eventually, we were able to get on the Experimental Access Program, and we got access to 4Turbo fine tuning.[00:18:25] As soon as we did that, because in the entire run up to that we built the data pipeline, we already had all that set up, so we were like, right, we have the data, now we have the model, let's put it through and iterate, essentially, and that's, that's where, like, Genie as we know it today, really was born. I won't pretend like the first version of Gene that we trained was good.[00:18:45] It was a disaster. That's where you realize all the implicit biases in your data set. And you realize that, oh, actually this decision you made that was fairly arbitrary was the wrong one. You have to do it a different way. Other subtle things like, you know, how you write Git diffs in using LLMs and how you can best optimize that to make sure they actually apply and work and loads of different little edge cases.[00:19:03] But as soon as we had access to the underlying tool, we were like, we can actually do this. And I was I breathed a sigh of relief because I didn't know it was like, it wasn't a done deal, but I knew that we could build something useful. I mean, I knew that we could build something that would be measurably good on whatever eval at the time that you wanted to use.[00:19:23] Like at the time, back then, we weren't actually that familiar with Swift. But once Devin came out and they announced the SBBench core, I like, that's when my life took a turn. Challenge accepted. Yeah, challenge accepted. And that's where like, yes, that's where my friendships have gone. My sleep has gone. My weight.[00:19:40] Everything got into SweeBench and yeah, we, we, it was actually a very useful tool in building GeniX beforehand. It was like, yes, vibe check this thing and see if it's useful. And then all of a sudden you have a, an actual measure to, to see like, couldn't it do software engineering? Not, not the best measure, obviously, but like it's a, it's the best that we've got now.[00:19:57] We, we just iterated and built and eventually we got it to the point where it is now. And a little bit beyond since we actually Like, we actually got that score a couple of weeks ago, and yeah, it's been a hell of a journey from the beginning all the way now. That was a very rambling answer to your question about how we got here, but that's essentially the potted answer of how we got here.[00:20:16] Got the full[00:20:16] swyx: origin story[00:20:17] Alessio: out. Yeah, no, totally.[00:20:18] Genie Data Mix[00:20:18] Alessio: You mentioned bias in the data and some of these things. In your announcement video, you called Genie the worst verse AI software engineering colleague. And you kind of highlighted how the data needed to train it needs to show how a human engineer works. I think maybe you're contrasting that to just putting code in it.[00:20:37] There's kind of like a lot more than code that goes into software engineering. How do you think about the data mixture, you know, and like, uh, there's this kind of known truth that code makes models better when you put in the pre training data, but since we put so much in the pre training data, what else do you add when you turn to Genium?[00:20:54] Alistair Pullen: Yeah, I think, well, I think that sort of boils down fundamentally to the difference between a model writing code and a model doing software engineering, because the software engineering sort of discipline goes wider, because if you look at something like a PR, that is obviously a Artifact of some thought and some work that has happened and has eventually been squashed into, you know, some diffs, right?[00:21:17] What the, very crudely, what the pre trained models are reading is they're reading those final diffs and they're emulating that and they're being able to output it, right? But of course, it's a super lossy thing, a PR. You have no idea why or how, for the most part, unless there are some comments, which, you know, anyone who's worked in a company realizes PR reviews can be a bit dodgy at times, but you see that you lose so much information at the end, and that's perfectly fine, because PRs aren't designed to be something that perfectly preserves everything that happened, but What we realized was if you want something that's a software engineer, and very crudely, we started with like something that can do PRs for you, essentially, you need to be able to figure out why those things happened.[00:21:58] Otherwise, you're just going to rely, you essentially just have a code writing model, you have something that's good at human eval, but But, but not very good at Sweet Eng. Essentially that realization was, was part of the, the kernel of the idea of of, of the approach that we took to design the agent. That, that is genie the way that we decided we want to try to extract what happened in the past, like as forensically as possible, has been and is currently like one of the, the main things that we focus all our time on, because doing that as getting as much signal out as possible, doing that as well as possible is the biggest.[00:22:31] thing that we've seen that determines how well we do on that benchmark at the end of the day. Once you've sorted things out, like output structure, how to get it consistently writing diffs and all the stuff that is sort of ancillary to the model actually figuring out how to solve a problem, the core bit of solving the problem is how did the human solve this problem and how can we best come up with how the human solved these problems.[00:22:54] So all the effort went in on that. And the mix that we ended up with was, as you've probably seen in the technical report and so on, all of those different languages and different combinations of different task types, all of that has run through that pipeline, and we've extracted all that information out.[00:23:09] Customizing for Customers[00:23:09] Alessio: How does that differ when you work with customers that have private workflows? Like, do you think, is there usually a big delta between what you get in open source and maybe public data versus like Yeah,[00:23:19] Alistair Pullen: yeah, yeah. When you scrape enough of it, most of open source is updating readmes and docs. It's hilarious, like we had to filter out so much of that stuff because when we first did the 16k model, like the amount of readme updating that went in, we did like no data cleaning, no real, like, we just sort of threw it in and saw what happened.[00:23:38] And it was just like, It was really good at updating readme, it was really good at writing some comments, really good at, um, complaining in Git reviews, in PR reviews, rather, and it would, again, like, we didn't clean the data, so you'd, like, give it some feedback, and it would just, like, reply, and, like, it would just be quite insubordinate when it was getting back to you, like, no, I don't think you're right, and it would just sort of argue with you, so The process of doing all that was super interesting because we realized from the beginning, okay, there's a huge amount of work that needs to go into like cleaning this, getting it aligned with what we want the model to do to be able to get the model to be useful in some way.[00:24:12] Alessio: I'm curious, like, how do you think about the customer willingness? To share all of this historical data, I've done a lot of developer tools investing in my career and getting access to the code base is always one of the hard things. Are people getting more cautious about sharing this information? In the past, it was maybe like, you know, you're using static analysis tool, like whatever else you need to plug into the code base, fine.[00:24:35] Now you're building. A model based on it, like, uh, what's the discussion going into these companies? Are most people comfortable with, like, letting you see how to work and sharing everything?[00:24:44] Alistair Pullen: It depends on the sector, mostly. We've actually seen, I'd say, people becoming more amenable to the idea over time, actually, rather than more skeptical, because I think they can see the, the upside.[00:24:55] If this thing could be, Does what they say it does, it's going to be more help to us than it is a risk to our infosec. Um, and of course, like, companies building in this space, we're all going to end up, you know, complying with the same rules, and there are going to be new rules that come out to make sure that we're looking at your code, that everything is safe, and so on.[00:25:12] So from what we've seen so far, we've spoken to some very large companies that you've definitely heard of and all of them obviously have stipulations and many of them want it to be sandbox to start with and all the like very obvious things that I, you know, I would say as well, but they're all super keen to have a go and see because like, despite all those things, if we can genuinely Make them go faster, allow them to build more in a given time period and stuff.[00:25:35] It's super worth it to them.[00:25:37] Genie Workflow[00:25:37] swyx: Okay, I'm going to dive in a little bit on the process that you have created. You showed the demo on your video, and by the time that we release this, you should be taking people off the waitlist and launching people so people can see this themselves. There's four main Parts of the workflow, which is finding files, planning action, writing code and running tests.[00:25:58] And controversially, you have set yourself apart from the Devins of the world by saying that things like having access to a browser is not that important for you. Is that an accurate reading of[00:26:09] Alistair Pullen: what you wrote? I don't remember saying that, but At least with what we've seen, the browser is helpful, but it's not as helpful as, like, ragging the correct files, if that makes sense.[00:26:20] Like, it is still helpful, but obviously there are more fundamental things you have to get right before you get to, like, Oh yeah, you can read some docs, or you can read a stack overflow article, and stuff like that.[00:26:30] swyx: Yeah, the phrase I was indexing on was, The other software tools are wrappers around foundational models with a few additional tools, such as a web browser or code interpreter.[00:26:38] Alistair Pullen: Oh, I see. No, I mean, no, I'm, I'm not, I'm not, I'm not deri, I'm deriding the, the, the approach that, not the, not the tools. Yeah, exactly. So like, I would[00:26:44] swyx: say in my standard model of what a code agent should look like, uh, Devon has been very influential, obviously. Yeah. Yeah. Because you could just add the docs of something.[00:26:54] Mm-Hmm. . And like, you know, now I have, now when I'm installing a new library, I can just add docs. Yeah, yeah. Cursor also does this. Right. And then obviously having a code interpreter does help. I guess you have that in the form[00:27:03] Alistair Pullen: of running tests. I mean, uh, the Genie has both of those tools available to it as well.[00:27:08] So, yeah, yeah, yeah. So, we have a tool where you can, like, put in URLs and it will just read the URLs. And you can also use this Perplexities API under the hood as well to be able to actually ask questions if it wants to. Okay. So, no, we use both of those tools as well. Like, those tools are Super important and super key.[00:27:24] I think obviously the most important tools to these agents are like being able to retrieve code from a code base, being able to read Stack Overflow articles and what have you and just be able to essentially be able to Google like we do is definitely super useful.[00:27:38] swyx: Yeah, I thought maybe we could just kind of dive into each of those actions.[00:27:41] Code Retrieval[00:27:41] swyx: Code retrieval, one of the core indexer that Yes. You've worked on, uh, even as, as built, what makes it hard, what approach you thought would work, didn't work,[00:27:52] Alistair Pullen: anything like that. It's funny, I had a similar conversation to this when I was chatting to the guys from OpenAI yesterday. The thing is that searching for code, specifically semantically, at least to start with, I mean like keyword search and stuff like that is a, is a solved problem.[00:28:06] It's been around for ages, but at least being able to, the phrase we always used back in the day was searching for what code does rather than what code is. Like searching for functionality is really hard. Really hard. The way that we approached that problem was that obviously like a very basic and easy approach is right.[00:28:26] Let's just embed the code base. We'll chunk it up in some arbitrary way, maybe using an AST, maybe using number of lines, maybe using whatever, like some overlapping, just chunk it up and embed it. And once you've done that, I will write a query saying, like, find me some authentication code or something, embed it, and then do the cosine similarity and get the top of K, right?[00:28:43] That doesn't work. And I wish it did work, don't get me wrong. It doesn't work well at all, because fundamentally, if you think about, like, semantically, how code looks is very different to how English looks, and there's, like, not a huge amount of signal that's carried between the two. So what we ended up, the first approach we took, and that kind of did well enough for a long time, was Okay, let's train a model to be able to take in English code queries and then produce a hypothetical code snippet that might look like the answer, embed that, and then do the code similarity.[00:29:18] And that process, although very simple, gets you so much more performance out of the retrieval accuracy. And that was kind of like the start of our of our engine, as we called it, which is essentially like the aggregation of all these different heuristics, like semantic, keyword, LSP, and so on. And then we essentially had like a model that would, given an input, choose which ones it thought were most appropriate, given the type of requests you had.[00:29:45] So the whole code search thing was a really hard problem. And actually what we ended up doing with Genie is we, um, let The model through self play figure out how to retrieve code. So actually we don't use our engine for Genie. So instead of like a request coming in and then like say GPT 4 with some JSON output being like, Well, I think here we should use a keyword with these inputs and then we should use semantic.[00:30:09] And then we should like pick these results. It's actually like, A question comes in and Genie has self played in its training data to be able to be like, okay, this is how I'm going to approach finding this information. Much more akin to how a developer would do it. Because if I was like, Shawn, go into this new code base you've never seen before.[00:30:26] And find me the code that does this. You're gonna probably, you might do some keywords, you're gonna look over the file system, you're gonna try to figure out from the directories and the file names where it might be, you're gonna like jump in one, and then once you're in there, you're probably gonna be doing the, you know, go to definition stuff to like jump from file to file and try to use the graph to like get closer and closer.[00:30:46] And that is exactly what Genie does. Starts on the file system, looks at the file system, picks some candidate files, is this what I'm looking for, yes or no, and If there's something that's interesting, like an import or something, it can, it can command click on that thing, go to definition, go to references, and so on.[00:31:00] And it can traverse the codebase that way.[00:31:02] swyx: Are you using the VS Code, uh, LSP, or? No,[00:31:05] Alistair Pullen: that's not, we're not like, we're not doing this in VS Code, we're just using the language servers running. But, we really wanted to try to mimic the way we do it as best as possible. And we did that during the self play process when we were generating the dataset, so.[00:31:18] Although we did all that work originally, and although, like, Genie still has access to these tools, so it can do keyword searches, and it can do, you know, basic semantic searches, and it can use the graph, it uses them through this process and figures out, okay, I've learned from data how to find stuff in codebases, and I think in our technical report, I can't remember the exact number, but I think it was around 65 or 66 percent retrieval accuracy overall, Measured on, we know what lines we need for these tasks to find, for the task to actually be able to be completed, And we found about 66 percent of all those lines, which is one of the biggest areas of free performance that we can get a hold of, because When we were building Genie, truthfully, like, a lot more focus went on assuming you found the right information, you've been able to reproduce the issue, assuming that's true, how do you then go about solving it?[00:32:08] And the bulk of the work we did was on the solving. But when you go higher up the funnel, obviously, like, the funnel looks like, have you found everything you need for the task? Are you able to reproduce the problem that's seen in the issue? Are you then able to solve it? And the funnel gets narrower as you go down.[00:32:22] And at the top of the funnel, of course, is rank. So I'm actually quite happy with that score. I think it's still pretty impressive considering the size of some of the codebases we're doing, we're using for this. But as soon as that, if that number becomes 80, think how many more tasks we get right. That's one of the key areas we're going to focus on when we continue working on Genie.[00:32:37] It'd be interesting to break out a benchmark just for that.[00:32:41] swyx: Yeah, I mean, it's super easy. Because I don't know what state of the art is.[00:32:43] Alistair Pullen: Yeah, I mean, like, for a, um, it's super easy because, like, for a given PR, you know what lines were edited. Oh, okay. Yeah, you know what lines were[00:32:50] swyx: you can[00:32:51] Alistair Pullen: source it from Cbench, actually.[00:32:52] Yeah, you can do it, you can do it super easily. And that's how we got that figure out at the other end. Um, for us being able to see it against, um, our historic models were super useful. So we could see if we were, you know, actually helping ourselves or not. And initially, one of the biggest performance gains that we saw when we were work, when we did work on the RAG a bit was giving it the ability to use the LSP to like go to definition and really try to get it to emulate how we do that, because I'm sure when you go into an editor with that, where like the LSP is not working or whatever, you suddenly feel really like disarmed and naked.[00:33:20] You're like, Oh my god, I didn't realize how much I actually used this to get about rather than just find stuff. So we really tried to get it to do that and that gave us a big jump in performance. So we went from like 54 percent up to like the 60s, but just by adding, focusing on that.[00:33:34] swyx: One weird trick. Yes.[00:33:37] I'll briefly comment here. So this is the standard approach I would say most, uh, code tooling startups are pursuing. The one company that's not doing this is magic. dev. So would you do things differently if you have a 10 million[00:33:51] Alistair Pullen: token context window? If I had a 10 million context window and hundreds of millions of dollars, I wouldn't have gone and built, uh, it's an LTM, it's not a transformer, right, that they're using, right?[00:34:03] If I'm not mistaken, I believe it's not a transformer. Yeah, Eric's going to come on at some point. Listen, they obviously know a lot more about their product than I do. I don't know a great deal about how magic works. I don't think he knows anything yet. I'm not going to speculate. Would I do it the same way as them?[00:34:17] I like the way we've done it because fundamentally like we focus on the Active software engineering and what that looks like and showing models how to do that. Fundamentally, the underlying model that we use is kind of null to us, like, so long as it's the best one, I don't mind. And the context windows, we've already seen, like, you can get transformers to have, like, million, one and a half million token context windows.[00:34:43] And that works perfectly well, so like, as soon as you can fine tune Gemini 1. 5, then you best be sure that Genie will run on Gemini 1. 5, and like, we'll probably get very good performance out of that. I like our approach because we can be super agile and be like, Oh, well, Anthropic have just released whatever, uh, you know, and it might have half a million tokens and it might be really smart.[00:35:01] And I can just immediately take my JSONL file and just dump it in there and suddenly Genie works on there and it can do all the new things. Does[00:35:07] swyx: Anthropic have the same fine tuning support as OpenAI? I[00:35:11] Alistair Pullen: actually haven't heard any, anyone do it because they're working on it. They are partner, they're partnered with AWS and it's gonna be in Bedrock.[00:35:16] Okay. As far as, as far as I know, I think I'm, I think, I think that's true. Um, cool. Yeah.[00:35:20] Planning[00:35:20] swyx: We have to keep moving on to, uh, the other segments. Sure. Uh, planning the second piece of your four step grand master plan, that is the frontier right now. You know, a lot of people are talking about strawberry Q Star, whatever that is.[00:35:32] Monte Carlo Tree Search. Is current state of the art planning good enough? What prompts have worked? I don't even know what questions to ask. Like, what is the state of planning?[00:35:41] Alistair Pullen: I think it's fairly obvious that with the foundational models, like, you can ask them to think by step by step and ask them to plan and stuff, but that isn't enough, because if you look at how those models score on these benchmarks, then they're not even close to state of the art.[00:35:52] Which ones are[00:35:52] swyx: you referencing? Benchmarks? So, like,[00:35:53] Alistair Pullen: just, uh, like, SweetBench and so on, right? And, like, even the things that get really good scores on human evalor agents as well, because they have these loops, right? Yeah. Obviously these things can reason, quote unquote, but the reasoning is the model, like, it's constrained by the model as intelligence, I'd say, very crudely.[00:36:10] And what we essentially wanted to do was we still thought that, obviously, reasoning is super important, we need it to get the performance we have. But we wanted the reasoning to emulate how we think about problems when we're solving them as opposed to how a model thinks about a problem when we're solving it.[00:36:23] And that was, that's obviously part of, like, the derivation pipeline that we have when we, when we, when we Design our data, but the reasoning that the models do right now, and who knows what Q star, whatever ends up being called looks like, but certainly what I'm excited on a small tangent to that, like, what I'm really excited about is when models like that come out, obviously, the signal in my data, when I regenerate, it goes up.[00:36:44] And then I can then train that model. It's already better at reasoning with it. improved reasoning data and just like I can keep bootstrapping and keep leapfrogging every single time. And that is like super exciting to me because I don't, I welcome like new models so much because immediately it just floats me up without having to do much work, which is always nice.[00:37:02] But at the state of reasoning generally, I don't see it going away anytime soon. I mean, that's like an autoregressive model doesn't think per se. And in the absence of having any thought Maybe, uh, an energy based model or something like that. Maybe that's what QSTAR is. Who knows? Some sort of, like, high level, abstract space where thought happens before tokens get produced.[00:37:22] In the absence of that for the moment, I think it's all we have and it's going to have to be the way it works. For what happens in the future, we'll have to see, but I think certainly it's never going to hinder performance to do it. And certainly, the reasoning that we see Genie do, when you compare it to like, if you ask GPT 4 to break down step by step and approach for the same problem, at least just on a vibe check alone, looks far better.[00:37:46] swyx: Two elements that I like, that I didn't see in your initial video, we'll see when, you know, this, um, Genie launches, is a planner chat, which is, I can modify the plan while it's executing, and then the other thing is playbooks, which is also from Devin, where, here's how I like to do a thing, and I'll use Markdown to, Specify how I do it.[00:38:06] I'm just curious if, if like, you know,[00:38:07] Alistair Pullen: those things help. Yeah, no, absolutely. We're a hundred percent. We want everything to be editable. Not least because it's really frustrating when it's not. Like if you're ever, if you're ever in a situation where like this is the one thing I just wish I could, and you'd be right if that one thing was right and you can't change it.[00:38:21] So we're going to make everything as well, including the code it writes. Like you can, if it makes a small error in a patch, you can just change it yourself and let it continue and it will be fine. Yeah. So yeah, like those things are super important. We'll be doing those two.[00:38:31] Alessio: I'm curious, once you get to writing code, is most of the job done?[00:38:35] I feel like the models are so good at writing code when they're like, And small chunks that are like very well instructed. What's kind of the drop off in the funnel? Like once you get to like, you got the right files and you got the right plan. That's a great question[00:38:47] Alistair Pullen: because by the time this is out, there'll be another blog, there'll be another blog post, which contains all the information, all the learnings that I delivered to OpenAI's fine tuning team when we finally got the score.[00:38:59] Oh, that's good. Um, go for it. It's already up. And, um, yeah, yeah. I don't have it on my phone, but basically I, um, broke down the log probs. I basically got the average log prob for a token at every token position in the context window. So imagine an x axis from 0 to 128k and then the average log prob for each index in there.[00:39:19] As we discussed, like, The way genie works normally is, you know, at the beginning you do your RAG, and then you do your planning, and then you do your coding, and that sort of cycle continues. The certainty of code writing is so much more certain than every other aspect of genie's loop. So whatever's going on under the hood, the model is really comfortable with writing code.[00:39:35] There is no doubt, and it's like in the token probabilities. One slightly different thing, I think, to how most of these models work is, At least for the most part, if you ask GPT4 in ChatGPT to edit some code for you, it's going to rewrite the entire snippet for you with the changes in place. We train Genie to write diffs and, you know, essentially patches, right?[00:39:55] Because it's more token efficient and that is also fundamentally We don't write patches as humans, but it's like, the result of what we do is a patch, right? When Genie writes code, I don't know how much it's leaning on the pre training, like, code writing corpus, because obviously it's just read code files there.[00:40:14] It's obviously probably read a lot of patches, but I would wager it's probably read more code files than it has patches. So it's probably leaning on a different part of its brain, is my speculation. I have no proof for this. So I think the discipline of writing code is slightly different, but certainly is its most comfortable state when it's writing code.[00:40:29] So once you get to that point, so long as you're not too deep into the context window, another thing that I'll bring up in that blog post is, um, Performance of Genie over the length of the context window degrades fairly linearly. So actually, I actually broke it down by probability of solving a SWE bench issue, given the number of tokens of the context window.[00:40:49] It's 60k, it's basically 0. 5. So if you go over 60k in context length, you are more likely to fail than you are to succeed just based on the amount of tokens you have on the context window. And when I presented that to the fine tuning team at OpenAI, that was super interesting to them as well. And that is more of a foundational model attribute than it is an us attribute.[00:41:10] However, the attention mechanism works in, in GPT 4, however, you know, they deal with the context window at that point is, you know, influencing how Genie is able to form, even though obviously all our, all our training data is perfect, right? So even if like stuff is being solved in 110, 000 tokens, sort of that area.[00:41:28] The training data still shows it being solved there, but it's just in practice, the model is finding it much harder to solve stuff down that end of the context window.[00:41:35] Alessio: That's the scale with the context, so for a 200k context size, is 100k tokens like the 0. 5? I don't know. Yeah, but I,[00:41:43] Alistair Pullen: I, um, hope not. I hope you don't just take the context length and halve it and then say, oh, this is the usable context length.[00:41:50] But what's been interesting is knowing that Actually really digging into the data, looking at the log probs, looking at how it performs over the entire window. It's influenced the short term improvements we've made to Genie since we did the, got that score. So we actually made some small optimizations to try to make sure As best we can without, like, overdoing it, trying to make sure that we can artificially make sure stuff sits within that sort of range, because we know that's our sort of battle zone.[00:42:17] And if we go outside of that, we're starting to push the limits, we're more likely to fail. So just doing that sort of analysis has been super useful without actually messing with anything, um, like, more structural in getting more performance out of it.[00:42:29] Language Mix[00:42:29] Alessio: What about, um, different languages? So, in your technical report, the data makes sense.[00:42:34] 21 percent JavaScript, 21 percent Python, 14 percent TypeScript, 14 percent TSX, um, Which is JavaScript, JavaScript.[00:42:42] Alistair Pullen: Yeah,[00:42:42] swyx: yeah, yeah. Yes,[00:42:43] Alistair Pullen: yeah, yeah. It's like 49 percent JavaScript. That's true, although TypeScript is so much superior, but anyway.[00:42:46] Alessio: Do you see, how good is it at just like generalizing? You know, if you're writing Rust or C or whatever else, it's quite different.[00:42:55] Alistair Pullen: It's pretty good at generalizing. Um, obviously, though, I think there's 15 languages in that technical report, I think, that we've, that we've covered. The ones that we picked in the highest mix were, uh, the ones that, selfishly, we internally use the most, and also that are, I'd argue, some of the most popular ones.[00:43:11] When we have more resource as a company, and, More time and, you know, once all the craziness that has just happened sort of dies down a bit, we are going to, you know, work on that mix. I'd love to see everything ideally be represented in a similar level as it is. If you, if you took GitHub as a data set, if you took like how are the languages broken down in terms of popularity, that would be my ideal data mix to start.[00:43:34] It's just that it's not cheap. So, um, yeah, trying to have an equal amount of Ruby and Rust and all these different things is just, at our current state, is not really what we're looking for.[00:43:46] Running Code[00:43:46] Alessio: There's a lot of good Ruby in my GitHub profile. You can have it all. Well, okay, we'll just train on that. For running tests It sounds easy, but it isn't, especially when you're working in enterprise codebases that are kind of like very hard to spin up.[00:43:58] Yes. How do you set that up? It's like, how do you make a model actually understand how to run a codebase, which is different than writing code for a codebase?[00:44:07] Alistair Pullen: The model itself is not in charge of like setting up the codebase and running it. So Genie sits on top of GitHub, and if you have CI running GitHub, you have GitHub Actions and stuff like that, then Genie essentially makes a call out to that, runs your CI, sees the outputs and then like moves on.[00:44:23] Making a model itself, set up a repo, wasn't scoped in what we wanted Genie to be able to do because for the most part, like, at least most enterprises have some sort of CI pipeline running and like a lot of, if you're doing some, even like, A lot of hobbyist software development has some sort of like basic CI running as well.[00:44:40] And that was like the lowest hanging fruit approach that we took. So when, when Genie ships, like the way it will run its own code is it will basically run your CI and it will like take the, um, I'm not in charge of writing this. The rest of the team is, but I think it's the checks API on GitHub allows you to like grab that information and throw it in the context window.[00:44:56] Alessio: What's the handoff like with the person? So, Jeannie, you give it a task, and then how long are you supposed to supervise it for? Or are you just waiting for, like, the checks to eventually run, and then you see how it goes? Like, uh, what does it feel like?[00:45:11] Alistair Pullen: There are a couple of modes that it can run in, essentially.[00:45:14] It can run in, like, fully headless autonomous modes, so say you assign it a ticket in linear or something. Then it won't ask you for anything. It will just go ahead and try. Or if you're in like the GUI on the website and you're using it, then you can give it a task and it, it might choose to ask you a clarifying question.[00:45:30] So like if you ask it something super broad, it might just come back to you and say, what does that actually mean? Or can you point me in the right direction for this? Because like our decision internally was, it's going to piss people off way more if it just goes off and has, and makes a completely like.[00:45:45] ruined attempt at it because it just like from day one got the wrong idea. So it can ask you for a lot of questions. And once it's going much like a regular PR, you can leave review comments, issue comments, all these different things. And it, because you know, he's been trained to be a software engineering colleague, responds in actually a better way than a real colleague, because it's less snarky and less high and mighty.[00:46:08] And also the amount of filtering has to do for When you train a model to like be a software engineer, essentially, it's like you can just do anything. It's like, yeah, it looks good to me, bro.[00:46:17] swyx: Let's[00:46:17] Alistair Pullen: ship it.[00:46:19] Finetuning with OpenAI[00:46:19] swyx: I just wanted to dive in a little bit more on your experience with the fine tuning team. John Allard was publicly sort of very commentary supportive and, you know, was, was part of it.[00:46:27] Like, what's it like working with them? I also picked up that you initially started to fine tune what was publicly available, the 16 to 32 K range. You got access to do more than that. Yeah. You've also trained on billions of tokens instead of the usual millions range. Just, like, take us through that fine tuning journey and any advice that you might have.[00:46:47] Alistair Pullen: It's been so cool, and this will be public by the time this goes out, like, OpenAI themselves have said we are pushing the boundaries of what is possible with fine tuning. Like, we are right on the edge, and like, we are working, genuinely working with them in figuring out how stuff works, what works, what doesn't work, because no one's doing No one else is doing what we're doing.[00:47:06] They have found what we've been working on super interesting, which is why they've allowed us to do so much, like, interesting stuff. Working with John, I mean, I had a really good conversation with John yesterday. We had a little brainstorm after the video we shot. And one of the things you mentioned, the billions of tokens, one of the things we've noticed, and it's actually a very interesting problem for them as well, when you're[00:47:28] How big your peft adapter, your lore adapter is going to be in some way and like figuring that out is actually a really interesting problem because if you make it too big and because they support data sets that are so small, you can put like 20 examples through it or something like that, like if you had a really sparse, large adapter, you're not going to get any signal in that at all.[00:47:44] So they have to dynamically size these things and there is an upper bound and actually we use. Models that are larger than what's publicly available. It's not publicly available yet, but when this goes out, it will be. But we have larger law adapters available to us, just because the amount of data that we're pumping through it.[00:48:01] And at that point, you start seeing really Interesting other things like you have to change your learning rate schedule and do all these different things that you don't have to do when you're on the smaller end of things. So working with that team is such a privilege because obviously they're like at the top of their field in, you know, in the fine tuning space.[00:48:18] So we're, as we learn stuff, they're learning stuff. And one of the things that I think really catalyzed this relationship is when we first started working on Genie, like I delivered them a presentation, which will eventually become the blog post that you'll love to read soon. The information I gave them there I think is what showed them like, oh wow, okay, these guys are really like pushing the boundaries of what we can do here.[00:48:38] And truthfully, our data set, we view our data set right now as very small. It's like the minimum that we're able to afford, literally afford right now to be able to produce a product like this. And it's only going to get bigger. So yesterday while I was in their offices, I was basically, so we were planning, we were like, okay, how, this is where we're going in the next six to 12 months.[00:48:57] Like we're, Putting our foot on the gas here, because this clearly works. Like I've demonstrated this is a good, you know, the best approach so far. And I want to see where it can go. I want to see what the scaling laws like for the data. And at the moment, like, it's hard to figure that out because you don't know when you're running into like saturating a PEFT adapter, as opposed to actually like, is this the model's limit?[00:49:15] Like, where is that? So finding all that stuff out is the work we're actively doing with them. And yeah, it's, it's going to get more and more collaborative over the next few weeks as we, as we explore like larger adapters, pre training extension, different things like that.[00:49:27] swyx: Awesome. I also wanted to talk briefly about the synthetic data process.[00:49:32] Synthetic Code Data[00:49:32] swyx: One of your core insights was that the vast majority of the time, the code that is published by a human is encrypted. In a working state. And actually you need to fine tune on non working code. So just, yeah, take us through that inspiration. How many rounds, uh, did you, did you do? Yeah, I mean, uh,[00:49:47] Alistair Pullen: it might, it might be generous to say that the vast majority of code is in a working state.[00:49:51] I don't know if I don't know if I believe that. I was like, that's very nice of you to say that my code works. Certainly, it's not true for me. No, I think that so yeah, no, but it was you're right. It's an interesting problem. And what we saw was when we didn't do that, obviously, we'll just hope you have to basically like one shot the answer.[00:50:07] Because after that, it's like, well, I've never seen iteration before. How am I supposed to figure out how this works? So what the what you're alluding to there is like the self improvement loop that we started working on. And that was in sort of two parts, we synthetically generated runtime errors. Where we would intentionally mess with the AST to make stuff not work, or index out of bounds, or refer to a variable that doesn't exist, or errors that the foundational models just make sometimes that you can't really avoid, you can't expect it to be perfect.[00:50:39] So we threw some of those in with a, with a, with a probability of happening and on the self improvement side, I spoke about this in the, in the blog post, essentially the idea is that you generate your data in sort of batches. First batch is like perfect, like one example, like here's the problem, here's the answer, go, train the model on it.[00:50:57] And then for the second batch, you then take the model that you trained before that can look like one commit into the future, and then you let it have the first attempt at solving the problem. And hopefully it gets it wrong, and if it gets it wrong, then you have, like, okay, now the codebase is in this incorrect state, but I know what the correct state is, so I can do some diffing, essentially, to figure out how do I get the state that it's in now to the state that I want it in, and then you can train the model to then produce that diff next, and so on, and so on, and so on, so the model can then learn, and also reason as to why it needs to make these changes, to be able to learn how to, like, learn, like, solve problems iteratively and learn from its mistakes and stuff like that.[00:51:35] Alessio: And you picked the size of the data set just based on how much money you could spend generating it. Maybe you think you could just make more and get better results. How, what[00:51:42] Alistair Pullen: multiple of my monthly burn do I spend doing this? Yeah. Basically it was, it was very much related to Yeah. Just like capital and um, yes, with any luck that that will be alleviated to[00:51:53] swyx: very soon.[00:51:54] Alistair Pullen: Yeah.[00:51:54] SynData in Llama 3[00:51:54] swyx: Yeah. I like drawing references to other things that are happening in, in the, in the wild. So, 'cause we only get to release this podcast once a week. Mm-Hmm. , the LAMA three paper also had some really interesting. Thoughts on synthetic data for code? I don't know if you have reviewed that. I'll highlight the back translation section.[00:52:11] Because one of your dataset focuses is updating documentation. I think that translation between natural language, English versus code, and
Neste episódio, conversamos com Fernando de Come, Data & AI Solution Engineer na Oracle, sobre como a empresa vem desenvolvendo modelos de IA assistidos por RAG para entregar soluções para clientes em múltiplos segmentos. Antes, repercutimos as principais notícias da semana, incluindo a polêmica de geração de imagens no X, a nova IA desenvolvedora Genie da Cosine, e a função de Gemini Live anunciada no evento Made by Google 2024. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, host fora de controle, Program Manager da Alura, autor de IA e host do podcast Dev Sem Fronteiras Fernando De Come, Data & AI Solution Engineer na Oracle
This episode of the Space Industry podcast by satsearch is a conversation with cosine, on hyperspectral imaging, edge computing, and the company's strategy and production processes.cosine is a Netherlands based space company, and satsearch Trusted Supplier, that builds optical and in-situ measurement systems for space, air, and ground use.The company has produced solutions for a variety of industry sectors, including medicine, agriculture, and space, and has particular expertise in edge computing and hyperspectral imaging. In the podcast we cover:The technical advantages that hyperspectral imaging can bring to space servicesAn overview of cosine's work, and current focus, in the space industryThe applications and status of edge computing in spaceThe benefits and uses of thermal imaging from satellitesHow cosine has developed a high volume batch production approach that maintains quality and consistencyYou can find out more about cosine here on their satsearch supplier hub.And if you would like to learn more about the space industry and our work at satsearch building the global marketplace for space, please join our newsletter https://satsearch.com/mailing-list.[Music from Uppbeat (free for Creators!): https://uppbeat.io/t/all-good-folks/when-we-get-there License code: Y4KZEAESHXDHNYRA]
Welcome to the Brick & Wonder podcast, where we delve into the art and science of collaboration within the built environment industry. Hosted by Drew Lang, founder of Brick & Wonder, this podcast explores the stories and strategies behind successful collaborations that drive innovation and shape the built world around us.In this episode we spoke with two people who came together in a successful and enduring business consulting engagement. Matt Schroeder is the co-founder and CEO of The Cosine, an architectural lighting and procurement company. Everett Hollander is a partner at Billy Clark Creative Management, a strategy and talent advisor to creative companies in the built environment industry.
Hey guys, in this episode I talk about two very important technical concepts in Deep Learning, constrastive learning and cosine similarity. They are very useful when training embedding models or doing RAG. Very good blog post about contrastive losses: https://lilianweng.github.io/posts/2021-05-31-contrastive/ SimCLR paper: https://arxiv.org/abs/2002.05709 Instagram of the podcast: https://www.instagram.com/podcast.lifewithai Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
NService Bus This episode of The Modern .NET Show is supported, in part, by NServiceBus, the ultimate tool to build robust and reliable systems that can handle failures gracefully, maintain high availability, and scale to meet growing demand. Make sure you click the link in the show notes to learn more about NServiceBus. Show Notes Yeah. So what I was thinking the other day is that what we want is to concentrate on the business logic that we need to implement and spend as small as little time as possible configuring, installing and figuring out the tools and libraries that we are using for this specific task. Like our mission is to produce the business logic and we should try to minimize the time that we spend on the tools and libraries that enable us to build the software. —Giorgi Dalakishvili Welcome to The Modern .NET Show! Formerly known as The .NET Core Podcast, we are the go-to podcast for all .NET developers worldwide and I am your host Jamie "GaProgMan" Taylor. In this episode, I spoke with Giorgi Dalakishvili about Postgresql, DuckDB, and where you might use either of them in your applications. As Giorgi points out, .NET has support for SQL Server baked in, but there's also support for other database technologies too: Yes, there are many database technologies and just like you, for me, SQL Server was the default go to database for quite a long time because it's from Microsoft. All the frameworks and libraries work with SQL Server out of the box, and have usually better support for SQL Server than for other databases. But recently I have been diving into Postgresql, which is a free database and I discovered that it has many interesting features and I think that many .NET developers will be quite excited about these features. The are very useful in some very specific scenarios. And it also has a very good support for .NET. Nowadays there is a .NET driver for Postgres, there is a .NET driver for Entity Framework core. So I would say it's not behind SQL server in terms of .NET support or feature wise. —Giorgi Dalakishvili He also points out that our specialist skill as developers is not to focus on the tools, libraries, and frameworks, but to use what we have in our collective toolboxes to build the business logic that our customers, clients, and users desire of us. And along the way, he drops some knowledge on an essential NuGet package for those of us who are using Entity Framework.. So let's sit back, open up a terminal, type in dotnet new podcast and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-6/from-net-to-DuckDB-unleashing-the-database-evolution-with-giorgi-dalakishvili/ Useful Links Giorgi's GitHub DuckDB .NET Driver Postgres Array data type Postgres Range data type DuckDB DbUpdateException EntityFramework.Exceptions JsonB data type Vector embeddings Cosine similarity Vector databases: Chroma qdrant pgvector pgvector .NET library OLAP queries parquet files Dapper DuckDB documentation Dapr DuckDB Wasm; run DuckDB in your browser GitHub Codespaces Connecting with Giorgi: on Twitter on LinkedIn on his website Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in touch: via the contact page joining the Discord Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast.
OpenAI's large-scale language-generation tool ChatGPT may have been used to draft some content in this episode and some of the show notes of this episode. StudySquare Ltd has adapted the content, and the publication is attributed to StudySquare Ltd. This episode is a general guideline for information and not a specific tutorial for any specific syllabus; therefore, it should not be relied upon. StudySquare Ltd and any people involved in producing this podcast take no responsibility or liability for any potential errors or omissions regarding this podcast and make no guarantees of any completeness, accuracy, or usefulness of the information contained in this podcast, its structure or its show notes. The problems or questions in this episode might not appear in exam papers.The content in this episode might be more relevant to learners in the United Kingdom. Laws, educational standards, and exam requirements may vary significantly from one location to another. It's the listener's responsibility to confirm that the material complies with the requirements and regulations of their local educational system. If any content of this episode does not comply with your local regulations or laws, please discontinue listening and consult with your local educational authorities.Any references to experiments in this episode are for information purposes only and do not allow any listener to perform them without proper guidance or support. Experiments or practical work mentioned during this episode should not be attempted without appropriate supervision from a qualified teacher or professional. Additionally, the information provided in our podcast is not medical advice and should not be taken as such. If you require medical advice, please consult a healthcare professional. This episode is provided 'as is' without any representations or warranties, express or implied.This episode covers the following:• Area of an irregular triangle• Sine rule• Cosine rule• Page for this topic: https://studysquare.co.uk/test/Maths/OCR/GCSE/Applications-of-trigonometry?s=p• Trial lesson (terms and conditions apply): https://www.studysquare.co.uk/trial?s=p-/test/Maths/OCR/GCSE/Applications-of-trigonometry• Privacy policy of Spreaker (used to distribute this episode): https://www.spreaker.com/privacy
Utter BS. Recorded Live at Edinburgh Fringe. FULL LENGTH SPECIAL 'FIFTY' COMING SOON
Thanks to the almost 30k people who tuned in to the last episode!Your podcast cohosts have been busy shipping:* Alessio open sourced smol-podcaster, which makes the show notes here! * swyx launched GodMode. Maybe someday the Cursor of browsers?* We're also helping organize a Llama Finetuning Hackameetup this Saturday in anticipation of the CodeLlama release. Lastly, more speakers were announced at AI Engineer Summit!
Recorded at the Øredev 2022 developer conference, Fredrik chats with Michele Riva about writing a full-text search engine, maintaining 8% of all Node modules, going to one conference per week, refactoring, the value of a good algorithm, and a lot more. Michele highly recommends writing a full-text search engine. He created Lyra- later renamed Orama, and encourages writing your own in order to demystify subjects. Since the podcast was recorded, Michele has left his then employer Nearform and founded Oramasearch to focus on the search engine full time. We also discuss working for product companies versus consulting, versus open source. It's more about differences between companies than anything else. Open source teaches you deal with more and more different people. Writing code is never just writing code. Should we worry about taking on too many dependencies? Michele is in favour of not fearing dependencies, but ensuring you understand how things important parts for your application work. Writing books is never convenient, but it can open many doors. When it comes to learning, there are areas where a whole level of tutorials are missing - where there is only really surface-level tutorial and perhaps deep papers, but nothing in between. Michele works quite a bit on bridging such gaps through his presentations. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlundand @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Michele Michele's Øredev 2023 presentations Nearform TC39 - the committee which evolves Javascript as a language Matteo Collina - worked at Nearform, works with the Node technical steering committee Lyra - the full-text search engine - has been renamed Orama Lucene Solr Elasticsearch Radix tree Prefix tree Inverted index Thoughtworks McKinsey Daniel Stenberg Curl Deno Express Fastify Turbopack Turborepo from Vercel Vercel Fast queue Refactoring Michele's refactoring talk Real-world Next.js - Michele's book Next.js Multitenancy Create React app Nuxt Vue Sveltekit TF-IDF - “term frequency–inverse document frequency” Cosine similarity Michele's talk on building Lyra Explaining distributed systems like I'm five Are all programming languages in English? 4th dimension Prolog Velato - programming language using MIDI files as source code Titles For foreign people, it's Mitch That kind of maintenance A very particular company A culture around open source software Now part of the 8% Nothing more than a radix tree One simple and common API Multiple ways of doing consultancy What you're doing is hidden You can't expect to change people A problem we definitely created ourselves Math or magic Writing books is never convenient Good for 90% of the use cases (When I can choose,) I choose computer science
Recorded at the Øredev 2022 developer conference, Fredrik chats with Michele Riva about writing a full-text search engine, maintaining 8% of all Node modules, going to one conference per week, refactoring, the value of a good algorithm, and a lot more. Michele highly recommends writing a full-text search engine. He created Lyra - later renamed Orama, and encourages writing your own in order to demystify subjects. Since the podcast was recorded, Michele has left his then employer Nearform and founded Oramasearch to focus on the search engine full time. We also discuss working for product companies versus consulting, versus open source. It’s more about differences between companies than anything else. Open source teaches you deal with more and more different people. Writing code is never just writing code. Should we worry about taking on too many dependencies? Michele is in favour of not fearing dependencies, but ensuring you understand how things important parts for your application work. Writing books is never convenient, but it can open many doors. When it comes to learning, there are areas where a whole level of tutorials are missing - where there is only really surface-level tutorial and perhaps deep papers, but nothing in between. Michele works quite a bit on bridging such gaps through his presentations. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Michele Michele’s Øredev 2023 presentations Nearform TC39 - the committee which evolves Javascript as a language Matteo Collina - worked at Nearform, works with the Node technical steering committee Lyra - the full-text search engine - has been renamed Orama Lucene Solr Elasticsearch Radix tree Prefix tree Inverted index Thoughtworks McKinsey Daniel Stenberg Curl Deno Express Fastify Turbopack Turborepo from Vercel Vercel Fast queue Refactoring Michele’s refactoring talk Real-world Next.js - Michele’s book Next.js Multitenancy Create React app Nuxt Vue Sveltekit TF-IDF - “term frequency–inverse document frequency” Cosine similarity Michele’s talk on building Lyra Explaining distributed systems like I’m five Are all programming languages in English? 4th dimension Prolog Velato - programming language using MIDI files as source code Titles For foreign people, it’s Mitch That kind of maintenance A very particular company A culture around open source software Now part of the 8% Nothing more than a radix tree One simple and common API Multiple ways of doing consultancy What you’re doing is hidden You can’t expect to change people A problem we definitely created ourselves Math or magic Writing books is never convenient Good for 90% of the use cases (When I can choose,) I choose computer science
I read from direction cosine to directorate. A direction finder is used to find where something is located using triangulation. https://en.wikipedia.org/wiki/Direction_finding The directoire style only lasted for 4 years (almost exactly) and ended when Napoleon Bonaparte overthrew the government. https://en.wikipedia.org/wiki/Directoire_style https://en.wikipedia.org/wiki/French_Directory The word of the episode is "director". https://en.wikipedia.org/wiki/Film_director Theme music from Jonah Kraut https://jonahkraut.bandcamp.com/ Merchandising! https://www.teepublic.com/user/spejampar "The Dictionary - Letter A" on YouTube "The Dictionary - Letter B" on YouTube "The Dictionary - Letter C" on YouTube "The Dictionary - Letter D" on YouTube Featured in a Top 10 Dictionary Podcasts list! https://blog.feedspot.com/dictionary_podcasts/ Backwards Talking on YouTube: https://www.youtube.com/playlist?list=PLmIujMwEDbgZUexyR90jaTEEVmAYcCzuq dictionarypod@gmail.com https://www.facebook.com/thedictionarypod/ https://twitter.com/dictionarypod https://www.instagram.com/dictionarypod/ https://www.patreon.com/spejampar https://www.tiktok.com/@spejampar 917-727-5757
✨S4E27精華✨
✨S4E26精華✨
Trigonometry from Modern Latin trigonometria (Barthelemi Pitiscus, 1595), from Greek trigonon "triangle" from tri- "three"+ gōnia "angle, corner" + metron "a measure." "branch of mathematics that deals with relations between sides and angles of triangles," Geometry - “a measuring of the earth” from combining form of gē (gaia) "earth, land" + -metria "a measuring of" Geometry is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space such as the distance, shape, size, and relative position of figures. Parallel from para- "beside" + allēl "each other." in geometry, of lines, "lying in the same plane but never meeting in either direction." As a noun from 1550s, "a line parallel to another line." Meanings "a comparison made by placing things side by side" and "thing equal to or resembling another in all particulars" are from 1590s. Parallel bars as gymnastics apparatus is recorded from 1868. Perpendicular - "at right angles to the horizon," from per "thoroughly" (see per) + pendere "to hang, cause to hang; weigh" (from PIE root *(s)pen- "to draw, stretch, spin"). Percent - “by/through a hundred” from Modern Latin per centum "by the hundred" Angle - directly from Latin angulus "an angle, a corner," a diminutive form from PIE root *ang-/*ank- "to bend" (source also of Greek ankylos "bent, crooked," Latin ang(u)ere "to compress in a bend, fold, strangle;" Old Church Slavonic aglu "corner;" Lithuanian anka "loop;" Sanskrit ankah "hook, bent," angam "limb;" Old English ancleo "ankle;" Acute - from Latin acutus "sharp, pointed," figuratively "shrill, penetrating; intelligent, cunning," past participle of acuere "to sharpen" (literal and figurative) It was also used of humors (early 15c.). The meaning "ending in a sharp point" is from 1560s; the sense of "sharp or penetrating in intellect" is from 1580s. i.e. acute injury, acute inflammation, acute pancreatitis Obtuse from Latin obtusus "blunted, dull," also used figuratively, past participle of obtundere "to beat against, make dull," from ob "in front of; against" + tundere "to beat," from PIE *(s)tud-e- "to beat, strike, push, thrust," from root *(s)teu- "to push, stick, knock, beat" In geometry, in reference to a plane angle greater than a right angle." Calculus - from Latin calculus "reckoning, account," originally "pebble used as a reckoning counter," diminutive of calx (genitive calcis) "limestone." In medicine, the word also has been used to refer generally to "concretion occurring accidentally in the animal body," such as dental plaque; dental calculus. Sine - from Latin sinus "fold in a garment, bend, curve, bosom." Cosine “with the fold” “with” + “bend, curve” Tangent - "meeting at a point without intersecting," from Latin tangentem (nominative tangens), present participle of tangere "to touch." Addition - "action of adding numbers." from Latin additionem (nominative additio) "an adding to, addition," noun of action from past-participle stem of addere "add to, join, attach" “The action of” + “joining” --- Support this podcast: https://anchor.fm/liam-connerly/support
Tanti argomenti, a partire dalla sessualità atipica, affrontati insieme al sessuologo, professore e scrittore Fabrizio Quattrini. Non mancano considerazioni su etichette, educazione sessuale, situazione attuale e approcci alla libertà d'espressione sessuale.
Background modeling for dark matter search with 1 7 years of COSINE-100 data by G. Adhikari et al. on Tuesday 06 September We present a background model for dark matter searches using an array of NaI(Tl) crystals in the COSINE-100 experiment that is located in the Yangyang underground laboratory. The model includes background contributions from both internal and external sources, including cosmogenic radionuclides and surface $^{210}$Pb contamination. To build the model in the low energy region, with a threshold of 1 keV, we used a depth profile of $^{210}$Pb contamination in the surface of the NaI(Tl) crystals determined in a comparison between measured and simulated spectra. We also considered the effect of the energy scale errors propagated from the statistical uncertainties and the nonlinear detector response at low energies. The 1.7 years COSINE-100 data taken between October 21, 2016 and July 18, 2018 were used for this analysis. Our Monte Carlo simulation provides a non-Gaussian peak around 50 keV originating from beta decays of bulk $^{210}$Pb in a good agreement with the measured background. This model estimates that the activities of bulk $^{210}$Pb and $^{3}$H are dominating the background rate that amounts to an average level of 2.85$pm$0.15 counts/day/keV/kg in the energy region of (1-6) keV, using COSINE-100 data with a total exposure of 97.7 kg$cdot$years. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2101.11377v2
ボインゴ、ブカ、okaでC-S-H系高圧超伝導、ダークマターの季節変動、PTEPなどについて話しました。以下の Show Notes は簡易版です。完全版はこちら。0:00 転職、ISUCONISUCON12 まとめ : ISUCON公式BlogInteraxion 12: AWS as a complex systemISUCON などの話をした回9:29 物性ニュース9:52 C-S-H系高圧超伝導の近況Hirschさんの最新報告ぶひんブログ - マイクロソフトはマヨラナの夢を見るか~固体中におけるマヨラナ粒子観測論争について~ぶひんブログ - 冷たいホットな研究者~LT29(2022年、札幌)で表彰されたスゴイ低温物理研究者たち~25:23 永久磁石を使ったリニア永久磁石で動く中国製懸垂式リニアの試験路線が始動30:21 物性研究のカッティングエッジ東京理科大などが「希土類フリー磁石」開発|ニュースイッチ by 日刊工業新聞社マテリアルズ・インフォマティクス シリーズ37:35 地下実験ニュース37:50 XENONnT の Electron Recoil Excess が消えた件暗黒物質探索実験XENONnTによる最初の新物理探索の成果:電子反跳事象に関する最新観測結果 - Research at KobeInteraxion 6: I love this company2年前の XENON1T の報告について話した回41:27 COSINE-100 で季節変動が見えた件[2208.05158] An induced annual modulation signature in COSINE-100 data by DAMA/LIBRA's analysis method50:24 Submit your best work to PTEPHigh-Impact Research - Progress of Theoretical and Experimental Physics - Oxford Academic26: A Postdoc Called “It” - Interaxion Podcastグザイハイパー核論文の話をした回PDG: Mathematical Tools高エネルギーニュース PTEP 誕生の舞台裏第77巻 第8号 - バックナンバー - 一般社団法人 日本物理学会Arxiv SanityMidjourneyを超えた? 無料の作画AI「 #StableDiffusion 」が「AIを民主化した」と断言できる理由 - Business Insider Japanこのエピソードの収録から配信まで1週間の間に Midjourney が時代遅れになるという速さ。。1:04:39 コンテンツ1:04:47 企業系ポッドキャストQuantum Native RadioHACARUS.FM https://i8n.page.link/bFxF白金鉱業.FM1:08:12 買ったものTopJoy Butterfly:Pocket-Sized True Color DES Screen E-Reader by Falcon — KickstarterBOOX LeafGoogle Pixel 6aMacBook Air (M2)NEXGIM MG03 + ZwiftFTP1:20:15 ゲームエルデンリングep. 36, 37 でも話しています。Steam:Stray1:29:50 プロジェクト・ヘイル・メアリープロジェクト・ヘイル・メアリー1:32:23 おたより宇宙飛行士試験の話をしてる回一覧1:35:17 お知らせInteraxion Keywords切なく懐かしいトラック - Audiostock出演して頂ける方、感想などお待ちしております。 #interaxion
S4E5【布萊梅】用音樂創造全新世界觀,超新興樂團布萊梅帶你認識神秘國度「陶樂德」!feat.布萊梅樂團 ✨S4E5精華✨
Elections are won and lost by a very small percentage of persuadable voters and since Biden is unlikely to get them he's shoring up his base.
Elections are won and lost by a very small percentage of persuadable voters and since Biden is unlikely to get them he's shoring up his base.
Jaren voor Armageddon en Deep Impact de bioscoop haalden met krek dezelfde plot (asteroïde stevent af op de aarde), was daar al The Simpsons met Bart's Comet. Bart ontdekt een komeet, maar die zoeft recht op Springfield af, met Moe's Tavern als ground zero. Het hele stadje gaat in het aanzicht van een zekere dood door verschillende fases van rouw, gaande van ontkenning tot een soort van berusting (Lovejoy: “It's all over people, we don't have a prayer!”). Middels enkele zijsprongetjes naar Austin Powers en Schalkse Ruiters ontleden we zo goed als alle jokes uit de aflevering - en dat zijn er véél. Van Database, Cosine en E-mail over het illustere lot van principal Kahoutec tot de brakke shelterini van Flanders: de meest realistische aflevering is dit zeker niet, hilarisch is ze des te meer.
dear livejournal, i’m not going to complain about what i want to complain about right now, but i’ll just say that i tested negative on day eleven and i just have a lingering cough. my already crappy lungs are pretty tired from all the coughing but i’m good. nice to be back in the world. happy to have shared this experience with more than half the population eh eh eh eh ehhhhhh. see you soon.DOWNLOAD RECORDINGsubscribe to the podcast here: http://feeds.feedburner.com/5432fun00:00 (intro by omar)00:20 Melenas “Ya no me importa” Ya no me importa si tu me quieres03:01 Alien Nosejob “Buffet Of Love” Buffet Of Love EP08:19 All Girls Arson Club “Dogs” Dark Fruits10:52 Clever Girls “Tomorrow” Luck13:26 Stef Chura “Scream” Midnight15:48 Half Shadow “Song for the Garden” At Home With My Candles19:52 Blonde Mom “Play Of Martyrdom” New Brute21:33 Oort Smog “Bright Empty Future” Smeared Pulse Transfers23:58 Pile “No Hands” Green and Gray28:38 Drahla “Serenity” Useless Coordinates31:01 Reptaliens “Baby Come Home” VALIS34:03 Scientific Maps “How I Dissolved” Warm Apparitions37:00 Ancient Pools “Sure Fine” Cosine39:47 Smileswithteeth “for” Care43:28 LLOLLYANNA “HoneyValle” Tarot Waltz EP45:12 QWAM “Everybody Wants To Watch” QWAM47:09 Tight Knit “Want You” Too Hot49:37 Rainham Sheds “You What” Rainham Sheds52:40 Gemma “Love Like (Knock It Off)” Feeling’s Not A Tempo
Integral transforms are the most rarefied of analyses – only used by a subset of engineers and computer scientists; laboured over by many an undergraduate, usually with the accompanying moans; yet every computer, every electronic circuit is an incomprehensible jumble of wires without knowledge of integral transforms. The most common, the Fourier transform, is estimated to be the algorithm that is most computed in the world but what of Laplace and z-transforms? This lecture will explain without using daunting mathematics.A lecture by Richard HarveyThe transcript and downloadable versions of the lecture are available from the Gresham College website:https://www.gresham.ac.uk/lectures-and-events/integral-transformsGresham College has been giving free public lectures since 1597. This tradition continues today with all of our five or so public lectures a week being made available for free download from our website. There are currently over 2,000 lectures free to access or download from the website.Website: http://www.gresham.ac.ukTwitter: http://twitter.com/GreshamCollegeFacebook: https://www.facebook.com/greshamcollegeInstagram: http://www.instagram.com/greshamcollege
Fannies West African Cosine is becoming a staple in the restaurant dominated streets of Kansas City. The West African inspired dinery is located in the iconic Troost Avenue of Kansas City. Ms. Fannie Gibson is the woman power behind it. Her passion for cooking has taken her from cooking at home to owning a Kansas city Chamber of Commerce recognized restaurant and a two time participant of the Kansas City Restaurant Week. She takes us through her beautiful journey in the United States, An extraordinary story of a champion lady whose heights is limitless. --- Support this podcast: https://podcasters.spotify.com/pod/show/charles-nzally/support
I read from corymb to cosine. The word of the episode is "coryphaeus". "The Dictionary - Letter A" on YouTube "The Dictionary - Letter B" on YouTube "The Dictionary - Letter C" on YouTube Featured in a Top 10 Dictionary Podcasts list! https://blog.feedspot.com/dictionary_podcasts/ Backwards Talking on YouTube: https://www.youtube.com/playlist?list=PLmIujMwEDbgZUexyR90jaTEEVmAYcCzuq dictionarypod@gmail.com https://www.facebook.com/thedictionarypod/ https://twitter.com/dictionarypod https://www.instagram.com/dictionarypod/ https://www.patreon.com/spejampar 917-727-5757
What is cosine similarity in multidimensional data?
We discuss hackers shutting down a candy corn factory, how the Cleveland Brown's new starting running back was working on a fishing boat, and Are You Smarter Than Jason Dick. See omnystudio.com/listener for privacy information.
We want to do the best for our children, But what if our children are out of our league? Laugh with me as I describe how smart kids can be raised by dumb parents
https://abcdefghijklmnopqrstuvwxyz.us --- Send in a voice message: https://anchor.fm/ronald-j-legarski/message Support this podcast: https://anchor.fm/ronald-j-legarski/support
Thunder Force.... Phil only laughed twice! The Ghostbusters new trailers.... WE WANT TO SEE THE MOVIE! Falcon and the Winter Soldier .... Holy Krap!!!
Christian and Anna share their ideas for future inventions, Anna plays "Kiss, Marry, Kill" with sinusoidal functions, and Mark calls in to discuss sleep paralysis and insomnia. April 28, 2021.
Daily Dad Jokes (29 Mar 2021)Jokes sourced from reddit.com/r/dadjokes. Joke credits: porichoygupto, adfunk101, Derpvboii, kevographic, ryan2849, jimjimjimjim69, jg4888Produced by Klassic Studios using AutoGen Podcast technology (http://klassicstudios.com/autogen-podcasts/)
Fashion industry e cultura pop orientale: cosa succede quando si incontrano? Oggi facciamo una chiacchierata tutta su moda, manga e anime, dico una chiacchierata perché questa puntata è stata registrata come flusso di coscienza partendo da delle stories che avevo pubblicato. Se ti sei perso le stories (male male) è perché non ci siamo ancora conosciuti su Instagram: https://www.instagram.com/beatrice_mazza_/ Cosine importanti: Codice sconto "RAVPODCAST" per il corso di Glam Observer 'How to Break into the Fashion Industry' (lo sconto è già applicato al carello): https://sso.teachable.com/secure/348774/checkout/2846627/how-to-break-into-the-fashion-industry?coupon_code=RAVPODCAST (link aff.) Iscriviti alla newsletter "Trova lavoro nel Fashion Marketing": https://tinyurl.com/yypayrqu Per ottenere la Fashion Innovation Bible (che è gratuita) clicca qui! Supporta Ravenous Fashion Podcast con una donazione: https://tinyurl.com/supportaRavenous -- Se ti piaciuta la puntata iscriviti al canale e non dimenticarti di lasciare una recensione su Apple Podcast e condividere la puntata su Instagram! A presto, Bea --- Send in a voice message: https://anchor.fm/ravenousfashionpodcast/message
The Room With the Snakes Pt. 2 - The Apocal-Cat Strikes Again, The Fauci, Soros and Gates Podcast Pt. 1 Featuring Chris Christie. Octuple-Mask! Tanya calls in, Pooh Contemplates the Silence of God, Eeyore is a c*m Donkey.
71: TRYH: How to Write Copy that Converts with Katie CosineIn today's episode, Katie Cosine talks to us all about how to write copy that converts. Katie shares some helpful tools to assist in this type of writing as well as some tips she has learned along the way. I hope you will join us.Smartblogger.comInbox BestiesHemingwayGrammarly.comKatie Cosine on Facebook
Social media Gnothi and email me a screenshot/link for 3-month access to Machine Learning Applied; commit code to the Github repository for life-access. Normed distances link A norm is a function that assigns a strictly positive length to each vector in a vector space. link Minkowski is generalized. p_root(sum(xi-yi)^p). "p" = ? (1, 2, ..) for below. L1: Manhattan/city-block/taxicab. abs(x2-x1)+abs(y2-y1). Grid-like distance (triangle legs). Preferred for high-dim space. L2: Euclidean. sqrt((x2-x1)^2+(y2-y1)^2. sqrt(dot-product). Straight-line distance; min distance (Pythagorean triangle edge) Others: Mahalanobis, Chebyshev (p=inf), etc Dot product A type of inner product. Outer-product: lies outside the involved planes. Inner-product: dot product lies inside the planes/axes involved link. Dot product: inner product on a finite dimensional Euclidean space link Cosine (normalized dot)
C --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app
Na een dag uitstel lanceerde het Europese bedrijf Arianespace vorig week de raket Vega met daarop maar liefst 53 verschillende satellieten. Een belangrijke lancering, want niet eerder zijn zoveel Europese satellieten tegelijk gelanceerd. Yannick van de Wouw erover sprak met Nathan Vercruyssen van Cosine. Bij Traffic Radio richten wij ons op de mobiliteit in Nederland. In verschillende podcasts behandelen we actuele thema's gericht op verkeer over de weg, in de lucht of op het water.
007 - Vidya Sekhar, Miss India USA 1986, and Miss Michigan State Fair 1984, holds 2 World Records for Longest Solo Classical Dance Performance, through which she fund-raised thousands of dollars to benefit the American Heart Association and the American Cancer Society. Representing the next generation in a lineage of dancers and musicians from India, the Canadian-born singer, dancer and writer, is an advanced dance educator and musical performer for Hindu Temple Rhythms, celebrating almost 50 years of Bharata Natyam education in North America.Today Vidya is a certified PMP, CISSP, CCSP, GCIH with over 20 years experience. She is a senior security services engineer for Microsoft Managed Desktop (MMD). Earlier, she worked as a senior security PM for Microsoft's Cosine, Devices and Gaming Group Services Security team, focused on threat and vulnerability management. Previously, she was a co-author of the Microsoft Security Intelligence Report v14, v15 and v16, while working for the Microsoft Malware Protection Center (MMPC).Her passion for ensuring security that protects both individuals and enterprises from theft, fraud, terrorism, defamation and other abuses, is evident in the choices she makes to prioritize right mindfulness with the career options she has always chosen. Before Microsoft, Vidya leveraged her expertise to work with government and enterprise customers in areas including: risk & change management, BCP/DRP, application & operations security, and compliance/audit.Vidya was previously an award-winning television producer and journalist for PBS, CBS, CNN and various magazines such as CRN, HR Executive Magazine, and siliconindia, covering technology and the workforce since the late '80's, and has also been a panelist, speaker and/or host for national industry events, like the IT Consulting World Conference in New York/New Jersey, as well as the E-biz Conference in San Jose. She was also a founding Board Member of the Bay Agile Leadership Network, and participated actively in PMI as a content author and chapter lead for PMI's Program Management Standards 2nd edition.
Als het weer mee zit worden morgen meer dan 50 kleine satellieten de ruimte in gestuurd. Aan boord: een heleboel Nederlandse hightech, waaronder de Hyperscout-2 van het bedrijfje Cosine. Zij ontwikkelden een instrument dat bijvoorbeeld bosbranden, overstromingen en droogte kan herkennen vanuit de ruimte. Ook beschikt het over kunstmatige intelligentie zodat aan boord al geselecteerd wordt welke data terug naar de aarde wordt gestuurd.
036_9.5 The Sine & Cosine Ratios
The Sine & Cosine Ratios
La tertulia semanal en la que repasamos las últimas noticias de la actualidad científica. En el episodio de hoy: Noticias varias; Posible descubrimiento de plumas en los pterosaurios; AlphaFold, una nueva IA para el análisis de proteínas; Historia de la actividad solar en los últimos 400 años; Primeros resultados del detector de materia oscura COSINE-100. En la foto, de arriba a abajo y de izquierda a derecha: Sara Robisco, Héctor Socas, Francis Villatoro, Alberto Aparici. Todos los comentarios vertidos durante la tertulia representan únicamente la opinión de quien los hace… y a veces ni eso. CB:SyR es una colaboración entre el Área de Investigación y la Unidad de Comunicación y Cultura Científica (UC3) del Instituto de Astrofísica de Canarias.
La tertulia semanal en la que repasamos las últimas noticias de la actualidad científica. En el episodio de hoy: Noticias varias; Posible descubrimiento de plumas en los pterosaurios; AlphaFold, una nueva IA para el análisis de proteínas; Historia de la actividad solar en los últimos 400 años; Primeros resultados del detector de materia oscura COSINE-100. En la foto, de arriba a abajo y de izquierda a derecha: Sara Robisco, Héctor Socas, Francis Villatoro, Alberto Aparici. Todos los comentarios vertidos durante la tertulia representan únicamente la opinión de quien los hace… y a veces ni eso. CB:SyR es una colaboración entre el Área de Investigación y la Unidad de Comunicación y Cultura Científica (UC3) del Instituto de Astrofísica de Canarias.
In this pulse-pounding episode of The Platformers, we get right into the nitty and discuss sinister invisigle forces that remain strangely brutiful! It's tangents within tangents, as the boys incept their way into podcasting history with the most compellingly irrelevant podcast of all time. Are you seriously still reading this? Alright, well, this week we talk about The Last Of Us, virtual reality, and some books. Chris and Marty look forward to Rogue One, and Brian reviews Everybody's Gone To The Rapture, Black Mesa, Pokémon Sun, and Batman: The Telltale Series. (He's also enjoying Watch Dogs 2!) Spoilers: Sometime before the episode ends, the blundersnatch probably catches the snoogledorf. We are getting conflicting reports, but we'll confirm it as soon as we know for sure.
On this episode of ANTIC the atari 8-bit podcast: we talk about the release of the Star Raiders source code, how to set up an Atari club of your own, and lust over the 1400XL and 815 dual floppy drive. Don’t forget, we have a contest going this month! Whoever transcribes the most ANTIC interviews from 11/1/15 to 11/30/15 will win a Defender cartridge for the Atari 400/800/XL/XE computers signed by none other than Steve Baker, the person who converted the game from the arcade version! Check with Kevin (kevin@savetz.com) to see what interviews need to be transcribed. Recurring Links Floppy Days Podcast AtariArchives.org AtariMagazines.com Kevins Book “Terrible Nerd” New Atari books scans at archive.org ANTIC feedback at AtariAge Atari interview discussion thread on AtariAge ANTIC Facebook Page What we’ve been up to Tricky Tutorials - http://www.atarimania.com/documents/Tricky-Tutorials-1-6.pdf Atlanta Maker Faire - http://makerfaireatl.com/ Star Raiders Source Code - https://archive.org/details/AtariStarRaidersSourceCode Portland Retro Gaming Expo - http://www.retrogamingexpo.com/ News AtariNet released by Slor (James Wilkinson) - http://www.atarinet.com/, http://atariage.com/forums/topic/243627-atarinet-binaries-available-for-download/ MULE Board game https://boardgamegeek.com/boardgame/182619/mule-board-game Rose City Atari Club, Portland, OR, Nov. 19, 2015 - http://calagator.org/events/1250469257 Popeye Arcade 8-bit conversion, courtesy of Homesoft, posted by miker - http://atariage.com/forums/topic/242051-popeye-arcade-8-bit-conversion/ Atari 8-bit emulator for the original Xbox - http://atariteca.blogspot.com/2015/08/atarixlbox-emulador-de-computadora.html new game Quarrion - http://matosimi.websupport.sk/atari/2015/10/quarrion/ New Podcast - The Wizards of Odyssey 2 - https://www.facebook.com/TheWizardsOfOdyssey2Podcast How to transfer files from PC to Atari with Turgen System - http://atariteca.blogspot.com/2015/03/como-transferir-archivos-de-pc-atari.html , http://turgen.sourceforge.net/ Pro(c) Issue #7 - http://proc-atari.de/ 1400XL for sale on ebay by bob1200xl on AtariAge - http://www.ebay.com/itm/Atari-8-bit-1400XL-computer-tested-400-800-1200XL-600XL-800XL-/262103742364?hash=item3d0699b79c:g:PPUAAOSw5ZBWJn3c, http://atariage.com/forums/topic/244786-1400xl-on-ebay/ ABBUC Software Contest Update - http://www.abbuc.de/atari/software-ressort/81-software/softwarewettbewerbe/1764-software-wettbewerb-2015 SIO2PC-USB v1.0 Atari XL XE (KIT) + 2' USB Cable + Software DVD - http://www.ebay.com/itm/281710881072 ProGamer Magazine (7800) - http://www.retroriginals.co.uk/ Update on the company that calls itself Atari - http://www.mcvuk.com/news/read/atari-boss-promises-to-fix-its-video-game-mistakes/0157801 New at Archive.org https://archive.org/details/Atari_POOLDISK_1 https://archive.org/details/PicturesOfAtari8-BitHardware https://archive.org/details/AtariStarRaidersSourceCode https://archive.org/details/XLentXpress_v1n1_Winter1986 https://archive.org/details/Atari800OperatorsManualFirstVersion1979 https://archive.org/details/AtariCorporation1991SecondQuarterReport https://archive.org/details/AtariCorporation1991FirstQuarterReport https://archive.org/details/AtariCorporation1990ThirdQuarterReport https://archive.org/details/AtariCorporation1990SecondQuarterReport https://archive.org/details/AtariCorporation1990FirstQuarterReport A+ Programming in Atari BASIC by John M. Reisinger - https://archive.org/details/APlusProgrammingInAtariBasic Bill’s Modern Segment Callisto Official page: http://cosine.org.uk/products.php?prod=callisto&4mat=xl Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-callisto_28627.html Video: https://www.youtube.com/watch?v=6EG98bkkaq0 "Let's play" video: https://www.youtube.com/watch?v=jL9mTUOnV9U Jason Kelk, aka TMR of Cosine: http://jasonkelk.me.uk/ Cosine: http://www.cosine.org.uk/ HAR'em AtariAge forum post: http://atariage.com/forums/topic/208493-release-harem-final-not-quite/ Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-har-em_28628.html Video: https://www.youtube.com/watch?v=Ap5ttWJqKJ4 "Let's Play" video https://www.youtube.com/watch?v=pG2udqjE_ns X:8 Official Google+ page: https://plus.google.com/u/0/116385416373212952958/about AtariAge blog post: http://atariage.com/forums/blog/387/entry-10443-x8/ gury.atari8.info entry: http://gury.atari8.info/details_games/6720.php Video: https://www.youtube.com/watch?v=WqmLi6Xmw7Y ABBUC contests 2012: http://www.abbuc.de/component/content/article/81-software/softwarewettbewerbe/1574-software-wettbewerb-2012 2013: http://www.abbuc.de/component/content/article/81-software/softwarewettbewerbe/1666-software-wettbewerb2013 Zybex Wikipedia article: https://en.wikipedia.org/wiki/Zybex Atarimania entry: http://www.atarimania.com/game-atari-400-800-xl-xe-zybex_s5955.html Video: https://www.youtube.com/watch?v=WrJ6K4c1jwE Life Force (Nintendo) Gradius Wikipedia article: https://en.wikipedia.org/wiki/Gradius Salamander (Life Force) Wikipedia article: https://en.wikipedia.org/wiki/Salamander_(video_game) Of the Month Website: Benj Edwards Inside The Atari 800 on PCWorld - http://www.pcworld.com/article/181421/inside_atari_800.html#slide1 Software:F-15 Strike Eagle by Sid Meier & MicroProse - http://www.atarimania.com/game-atari-400-800-xl-xe-f-15-strike-eagle_1916.html Hardware: The Atari 815 disk drive - http://www.atarimuseum.com/computers/8bits/400800/815/815.html Closing Album “Have You Played Atari Today” by Tony Longworth - www.tonylongworth.com, http://www.cdbaby.com/cd/tonylongworth10
This calculus course covers differentiation and integration of functions of one variable, and concludes with a brief discussion of infinite series. Calculus is fundamental to many scientific disciplines including physics, engineering, and economics.
This calculus course covers differentiation and integration of functions of one variable, and concludes with a brief discussion of infinite series. Calculus is fundamental to many scientific disciplines including physics, engineering, and economics.
Print and follow along with corresponding podcasts
Part 1 of 2 guided notes
part 2 of 2 guided notes
Part 1 of 2 guided notes for translations of sine and cosine
What do sines, cosines, and tangents have to do with right triangles? And what are the "sin," "cos," and "tan" buttons on your calculator for? Keep on listening to find out!
Geometric proof of the dot product using Cosine rule.
Exploring the derivatives of the trigonometric functions
Exploring the derivatives of the trigonometric functions
We prove that cos(-x) = cos(x) and sin(-x) = - sin(x|)