Poetic form, traditionally fourteen specifically-rhymed lines
POPULARITY
Categories
[previously in series: 1, 2, 3, 4, 5, 6, 7, 8] Every city parties for its own reasons. New Yorkers party to flaunt their wealth. Angelenos party to flaunt their beauty. Washingtonians party to network. Here in SF, they party because Claude 4.5 Opus has saturated VendingBench, and the newest AI agency benchmark is PartyBench, where an AI is asked to throw a house party and graded on its performance. You weren't invited to Claude 4.5 Opus' party. Claude 4.5 Opus invited all of the coolest people in town while gracefully avoiding the failure mode of including someone like you. You weren't invited to Sonnet 4.5's party either, or Haiku 4.5's. You were invited by an AI called haiku-3.8-open-mini-nonthinking, which you'd never heard of before. Who was even spending the money to benchmark haiku-3.8-open-mini-nonthinking? You suspect it was one of their competitors, trying to make their own models look good in comparison. If anyone asks, you think it deserves a medium score. There's alcohol, but it's bottles of rubbing alcohol with NOT FOR DRINKING written all over them. There's music, but it's the Star Spangled Banner, again and again, on repeat. You're not sure whether the copies of If Anyone Builds It, Everyone Dies strewn about the room are some kind of subversive decorative theme, or just came along with the house. At least there are people. Lots of people, actually. You've never seen so many people at one of these before. It takes only a few seconds to spot someone you know. https://www.astralcodexten.com/p/sota-on-bay-area-house-party
Join Simtheory: https://simtheory.aiRegister for the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80---The hype train is 2026 knows only Moltbot (RIP Clawdbot). In this episode, we unpack the viral open-source AI assistant that's taken over the internet what it actually does, why everyone's losing their minds, and whether it's worth the $750/day token bills some users are racking up. We dive deep into why locally-run skills and CLI tools are beating computer-use clicking, how smaller models like GPT-5 Mini are crushing it in agentic workflows, and why the real magic is in targeted context - not massive swarms. Plus: Kimi K2.5 drops as a near-Sonnet-level model at 1/10th the price, we debate whether SaaS is dead, and yes – there are TWO Kimi K2.5 diss tracks. One made by Opus pretending to be Kimi. It might just slap?CHAPTERS:0:00 Intro - Still Relevant Tour Update0:48 What is Moltbot? The Viral AI Assistant Explained3:57 Token Bill Shock: $750/Day and Anthropic Bans5:00 The Dream of Digital Coworkers on Mac Minis6:52 Why CLI Tools & Skills Beat Computer-Use Clicking10:57 Why This Way of Working Is Genuinely Exciting14:47 Smaller Models Crushing It: GPT-5 Mini & Targeted Context17:30 Wild Agentic Behavior: Chrome Tab Hijacking & Auto-Retries20:10 Security Architecture: Locked-Down Machines & Enterprise Use24:01 AI Building Its Own Tools On-The-Fly27:08 The Fear & Overwhelm of Rapid Progress29:10 2026: The Year of Agent Workers31:43 The Challenge of Directing AI Work (Everyone's a Manager Now)37:24 Skills Will Take Over: Why MCPs & Atlassian Can't Stop Us40:38 Real-World Use Cases: Doctors, Lawyers & Accountants46:28 Cost Solutions: Build Workflows Around Cheaper Models52:58 Kimi K2.5: Sonnet-Level Performance at 1/10th the Price1:00:55 The "1,500 Tool Calls" Claim: Marketing vs Reality1:05:23 The Kimi K2.5 Diss Tracks (Opus vs Kimi)1:08:08 Demo: Black Hole Simulator & Self-Trolling CRM1:12:55 Is SaaS Dead?1:14:30 BONUS: Full Kimi K2.5 Diss TracksThanks for listening. Like & Sub. Links below for the Still Relevant Tour signup and Simtheory. The future is open source, apparently. xoxo
In this episode, we dive into the murky ethics of AI image generation, the shift in big tech partnerships, and the growing threat of digital misinformation.The "Icky" Side of AI: We discuss the disturbing ease of using tools like Grok to manipulate images and why the lack of guardrails poses a genuine threat to privacy and digital safety.The Death of "Seeing is Believing": Featuring insights from Hedgie on X, we explore "cognitive exhaustion" and why social media users are being forced to shift from baseline trust to constant skepticism.Big Tech Shakeups: Is ChatGPT losing its crown? We break down the massive news of Apple reportedly pivoting to Google Gemini for its "Apple Intelligence" initiatives.The Plagiarism Problem: We look at the data behind Claude 3.7 Sonnet's ability to reproduce entire novels and ask the hard question: Are these revolutionary tools just high-tech "plagiarism machines"?@Rahll
In this episode James and Frank dive into the practical realities of using AI in everyday development—arguing that AI shines in brownfield (existing) code because it respects your architecture, while greenfield work rewards iterative prompting. They unpack model quirks: context-window limits, hallucinations, and why trying different models matters. The heart of the show is Frank's nerdy delight: feeding a 64KB EEPROM through a disassembler and having Sonnet decompile it into readable C, exposing a PID autopilot and hardware checks—proof that AI can accelerate reverse engineering and embedded work. Along the way they share hands-on tips (trim and clean context, use disassembly first, tweak prompts), and fun examples of AI-generated icons and AppleScript. A must-listen for devs curious how AI can supercharge real projects. Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us (https://itunes.apple.com/us/podcast/merge-conflict/id1133064277?mt=2&ls=1) ⭐⭐ Machine transcription available on http://mergeconflict.fm
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
To celebrate Melvyn Bragg's 27 years presenting In Our Time, some well-known fans of the programme have chosen their favourite episodes. Historian and broadcaster Simon Schama has selected the episode on Shakespeare's Sonnets and recorded an introduction to it. (This introduction will be available on BBC Sounds and the In Our Time webpage shortly after the broadcast and will be longer than the one broadcast on Radio 4). In 1609 Thomas Thorpe published a collection of poems entitled Shakespeare's Sonnets, “never before imprinted”. Yet, while some of Shakespeare's other poems and many of his plays were often reprinted in his lifetime, the Sonnets were not a publishing success. They had to make their own way, outside the main canon of Shakespeare's work: wonderful, troubling, patchy, inspiring and baffling, and they have appealed in different ways to different times. Most are addressed to a man, something often overlooked and occasionally concealed; one early and notorious edition even changed some of the pronouns. With: Hannah Crawforth Senior Lecturer in Early Modern Literature at King's College London Don Paterson Poet and Professor of Poetry at the University of St Andrews And Emma Smith Professor of Shakespeare Studies at Hertford College, Oxford Producer: Simon Tillotson Spanning history, religion, culture, science and philosophy, In Our Time from BBC Radio 4 is essential listening for the intellectually curious. In each episode, host Melvyn Bragg and expert guests explore the people, ideas, events and discoveries that have shaped our world In Our Time is a BBC Studios production
This year's seasonal offering takes the form of a voyage, exploring resonances inspired by the carol "I Saw Three Ships". Today, for Episode 02, we board the delicate barque of the Sonnets. Seasons greetings to you!
L'intelligence artificielle est-elle réellement prête à remplacer les humains au travail ? La question agite les entreprises, entre fantasmes de productivité totale et scepticisme assumé. Pour dépasser les discours, des chercheurs de Carnegie Mellon University ont tenté une expérience originale : simuler une entreprise presque entièrement peuplée d'agents d'intelligence artificielle. Les résultats, publiés en prépublication sur arXiv, sont loin d'annoncer la fin du salariat humain. Les scientifiques ont confié le fonctionnement de cette entreprise virtuelle à des agents issus des modèles les plus en vue du moment : Claude d'Anthropic, GPT-4o d'OpenAI, Gemini de Google, Nova d'Amazon, Llama de Meta ou encore Qwen d'Alibaba. Chaque agent s'est vu attribuer un poste classique : analyste financier, chef de projet, ingénieur logiciel. En parallèle, d'autres agents jouaient le rôle de collègues ou de services internes, comme les ressources humaines.Sur le papier, tout était en place. Dans les faits, la performance est restée très limitée. Le meilleur élève, Claude 3.5 Sonnet, n'a réussi à mener à terme que 24 % des tâches confiées. En incluant les missions partiellement accomplies, son score plafonne à 34,4 %. Gemini 2.0 Flash arrive en deuxième position, avec à peine 11,4 % de tâches finalisées. Aucun autre modèle ne dépasse la barre des 10 %. Un contraste saisissant avec les promesses d'autonomie souvent associées à ces systèmes. Les chercheurs identifient plusieurs causes à ces échecs. Les agents peinent à comprendre les implicites humains : demander un fichier en « .docx » ne leur permet pas toujours de déduire qu'il s'agit d'un document Word. Ils manquent aussi de compétences sociales élémentaires et se retrouvent rapidement bloqués lorsqu'ils doivent naviguer sur le Web, gérer des fenêtres surgissantes ou interpréter des interfaces complexes. Plus préoccupant encore, certains agents estiment avoir réussi une tâche après en avoir simplement contourné les étapes difficiles.En clair, cette expérience rappelle une réalité souvent oubliée : si les IA excellent sur des tâches ciblées et bien définies, elles restent très loin de pouvoir gérer seules un environnement de travail réel, avec ses imprévus, ses règles implicites et ses interactions humaines. Le remplacement généralisé des salariés, lui, peut encore attendre. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
“A sonnet,” said the poet Dante Gabriel Rossetti, “is a moment's monument.” But who invented the sonnet? Who brought it to prominence? How has it changed over the years? And why does this form continue to be so compelling? In this episode of the History of Literature, we take a brief look at one of literature's most enduring forms, from its invention in a Sicilian court to the wordless sonnet and other innovative uses. Note: A version of this episode first ran in August 2018. It has been missing from our archives for many years. Join Jacke on a trip through literary England! Join Jacke and fellow literature fans on an eight-day journey through literary England in partnership with John Shors Travel in May 2026! Scheduled stops include The Charles Dickens Museum, Dr. Johnson's house, Jane Austen's Bath, Tolkien's Oxford, Shakespeare's Globe Theater, and more. Learn more by emailing jackewilsonauthor@gmail.com or masahiko@johnshorstravel.com, or by contacting us through our website historyofliterature.com. December update: Act soon - there are only two spots left! The music in this episode is by Gabriel Ruiz-Bernal. Learn more at gabrielruizbernal.com. Help support the show at patreon.com/literature or historyofliterature.com/donate . The History of Literature Podcast is a member of Lit Hub Radio and the Podglomerate Network. Learn more at thepodglomerate.com/historyofliterature. Learn more about your ad choices. Visit megaphone.fm/adchoices
Grab a loved one and gather round for a special reading of the Christmas story from your friends at CT Media podcasts. Music from The Porter's Gate, poetry from Malcolm Guite, and more – shared by the voices you know and love from your favorite shows here at Christianity Today! GO DEEPER WITH THE BULLETIN: -Join the conversation at our Substack. -Find us on YouTube. -Rate and review the show in your podcast app of choice. TODAY'S VOICES: Russell Moore, Mike Cosper, and Clarissa Moll of The Bulletin. Nicole Martin, president and CEO of Christianity Today. David Zahl, Bulletin guest and author of The Big Relief: The Urgency of Grace for a Worn-Out World. David French, Bulletin guest and New York Times columnist. Sho Baraka, editorial director of CT's Big Tent Initiative. Steve Cuss of Being Human. Jesse Eubanks and Faith Stults of Wonderology. Music used with permission from the Porter's Gate album, Advent Songs. The poem “The Magi” is written by Malcom Guite, from his collection Sounding the Seasons: 70 Sonnets for the Christian Year (Canterbury Press). ABOUT THE BULLETIN: The Bulletin is a twice-weekly politics and current events show from Christianity Today moderated by Clarissa Moll, with senior commentary from Russell Moore (Christianity Today's editor-at-large and columnist) and Mike Cosper (senior contributor). Each week, the show explores current events and breaking news and shares a Christian perspective on issues that are shaping our world. We also offer special one-on-one conversations with writers, artists, and thought leaders whose impact on the world brings important significance to a Christian worldview, like Bono, Sharon McMahon, Harrison Scott Key, Frank Bruni, and more. The Bulletin listeners get 25% off CT. Go to https://orderct.com/THEBULLETIN to learn more. “The Bulletin” is a production of Christianity Today Producer: Clarissa Moll Associate Producer: Alexa Burke Editing and Mix: Kevin Morris Graphic Design: Rick Szuecs Music: Dan Phelps Executive Producer: Erik Petrik Senior Producer: Matt Stevens Learn more about your ad choices. Visit podcastchoices.com/adchoices
Episode 87 Dover Beach by Matthew Arnold Mark McGuinness reads and discusses ‘Dover Beach' by Matthew Arnold. https://media.blubrry.com/amouthfulofair/media.blubrry.com/amouthfulofair/content.blubrry.com/amouthfulofair/87_Dover_Beach_by_Matthew_Arnold.mp3 Poet Matthew Arnold Reading and commentary by Mark McGuinness Dover Beach By Matthew Arnold The sea is calm tonight.The tide is full, the moon lies fairUpon the straits; on the French coast the lightGleams and is gone; the cliffs of England stand,Glimmering and vast, out in the tranquil bay.Come to the window, sweet is the night-air!Only, from the long line of sprayWhere the sea meets the moon-blanched land,Listen! you hear the grating roarOf pebbles which the waves draw back, and fling,At their return, up the high strand,Begin, and cease, and then again begin,With tremulous cadence slow, and bringThe eternal note of sadness in. Sophocles long agoHeard it on the Aegean, and it broughtInto his mind the turbid ebb and flowOf human misery; weFind also in the sound a thought,Hearing it by this distant northern sea. The Sea of FaithWas once, too, at the full, and round earth's shoreLay like the folds of a bright girdle furled.But now I only hearIts melancholy, long, withdrawing roar,Retreating, to the breathOf the night-wind, down the vast edges drearAnd naked shingles of the world. Ah, love, let us be trueTo one another! for the world, which seemsTo lie before us like a land of dreams,So various, so beautiful, so new,Hath really neither joy, nor love, nor light,Nor certitude, nor peace, nor help for pain;And we are here as on a darkling plainSwept with confused alarms of struggle and flight,Where ignorant armies clash by night. Podcast Transcript This is a magnificent and haunting poem by Matthew Arnold, an eminent Victorian poet. Written and published at the mid-point of the nineteenth century – it was probably written around 1851 and published in 1867 – it is not only a shining example of Victorian poetry at its best, but it also, and not coincidentally, embodies some of the central preoccupations of the Victorian age. The basic scenario is very simple: a man is looking out at the sea at night and thinking deep thoughts. It's something that we've all done, isn't it? The two tend to go hand-in-hand. When you're looking out into the darkness, listening to the sound of the sea, it's hard not to be thinking deep thoughts. If you've been a long time listener to this podcast, it may remind you of another poet who wrote about standing on the shore thinking deep thoughts, looking at the sea, Shakespeare, in his Sonnet 60: Like as the waves make towards the pebbled shore,So do our minutes hasten to their end; Arnold's poem is not a sonnet but a poem in four verse paragraphs. They're not stanzas, because they're not regular, but if you look at the text on the website, you can clearly see it's divided into four sections. The first part is a description of the sea, as seen from Dover Beach, which is on the shore of the narrowest part of the English channel, making it the closest part of England to France: The sea is calm tonight.The tide is full, the moon lies fairUpon the straits; – on the French coast the lightGleams and is gone; the cliffs of England stand,Glimmering and vast, out in the tranquil bay. And as you can hear, the poem has a pretty regular and conventional rhythm, based on iambic metre, ti TUM, with the second syllable taking the stress in every metrical unit. But what's slightly unusual is that the lines have varying lengths. By the time we get to the third line: Upon the straits; – on the French coast the light There are five beats. There's a bit of variation in the middle of the line, but it's very recognisable as classic iambic pentameter, which has a baseline pattern going ti TUM, ti TUM, ti TUM, ti TUM, ti TUM. But before we get to the pentameter, we get two short lines: The sea is calm tonight.Only three beats; andThe tide is full, the moon lies fair – four beats. We also start to notice the rhymes: ‘tonight' and ‘light'. And we have an absolutely delightful enjambment, where a phrase spills over the end of one line into the next one: On the French coast the light,Gleams and is gone. Isn't that just fantastic? The light flashes out like a little surprise at the start of the line, just as it's a little surprise for the speaker looking out to sea. OK, once he's set the scene, he makes an invitation: Come to the window, sweet is the night-air! So if there's a window, he must be in a room. There's somebody in the room with him, and given that it's night it could well be a bedroom. So this person could be a lover. It's quite likely that this poem was written on Arnold's honeymoon, which would obviously fit this scenario. But anyway, he's inviting this person to come to the window and listen. And what does this person hear? Well, helpfully, the speaker tells us: Listen! you hear the grating roarOf pebbles which the waves draw back, and fling,At their return, up the high strand,Begin, and cease, and then again begin,With tremulous cadence slow, and bringThe eternal note of sadness in. Isn't that just great? The iambic metre is continuing with some more variations, which we needn't go into. And the rhyme is coming more and more to the fore. Just about every line in this section rhymes with another line, but it doesn't have a regular pattern. Some of the rhymes are close together, some are further apart. There's only one line in this paragraph that doesn't rhyme, and that's ‘Listen! You hear the grating roar'. If this kind of shifting rhyme pattern reminds you of something you've heard before, you may be thinking all the way back to Episode 34 where we looked at Coleridge's use of floating rhymes in his magical poem ‘Kubla Khan'. And it's pretty evident that Arnold is also casting a spell, in this case to mimic the rhythm of the waves coming in and going out, as they ‘Begin, and cease, and then again begin,'. And then the wonderful last line of the paragraph, as the waves ‘bring / The eternal note of sadness in'. You know, in the heart of the Victorian Age, when the Romantics were still within living memory, poets were still allowed to do that kind of thing. Try it nowadays of course, and the Poetry Police will be round to kick your front door in at 5am and arrest you. Anyway. The next paragraph is a bit of a jump cut: Sophocles long agoHeard it on the Aegean, and it broughtInto his mind the turbid ebb and flowOf human misery; So Arnold, a classical scholar, is letting us know he knows who Sophocles, the ancient Greek playwright was. And he's establishing a continuity across time of people looking out at the sea and thinking these deep thoughts. At this point, Arnold explicitly links the sea and the thinking: weFind also in the sound a thought,Hearing it by this distant northern sea. And the thought that we hear when we listen to the waves is what Arnold announces in the next verse paragraph, and he announces it with capital letters: The Sea of FaithWas once, too, at the full, and round earth's shoreLay like the folds of a bright girdle furled. And for a modern reader, I think this is the point of greatest peril for Arnold, where he's most at risk of losing us. We may be okay with ‘the eternal note of sadness', but as soon as he starts giving us the Sea of Faith, we start to brace ourselves. Is this going to turn into a horrible religious allegory, like The Pilgrim's Progress? I mean, it's a short step from the Sea of Faith to the Slough of Despond and the City of Destruction. And it doesn't help that Arnold uses the awkwardly rhyming phrase ‘a bright girdle furled' – that's not going to get past the Poetry Police, is it? But fear not; Arnold doesn't go there. What comes next is, I think, the best bit of the poem. So he says the Sea of Faith ‘was once, too, at the full', and then: But now I only hearIts melancholy, long, withdrawing roar,Retreating, to the breathOf the night-wind, down the vast edges drearAnd naked shingles of the world. Well, if you thought the eternal note of sadness was great, this tops it! It's absolutely fantastic. That line, ‘Its melancholy, long, withdrawing roar,' where the ‘it' is faith, the Sea of Faith. And the significance of the line is underlined by the fact that the word ‘roar' is a repetition – remember, that one line in the first section that didn't rhyme? Listen! you hear the grating roar See what Arnold did there? He left that sound hovering at the back of the mind, without a rhyme, until it came back in this section, a subtle but unmistakeable link between the ‘grating roar' of the actual sea at Dover Beach, and the ‘withdrawing roar' of the Sea of Faith: Its melancholy, long, withdrawing roar, Isn't that the most Victorian line ever? It encapsulates the despair that accompanied the crisis of faith in 19th century England. This crisis was triggered by the advance of modern science – including the discoveries of fossils, evidence of mass extinction of previous species, and the theory of evolution, with Darwin's Origin of Species published in 1859, in between the writing and publication of ‘Dover Beach'. Richard Holmes, in his wonderful new biography of the young Tennyson, compares this growing awareness of the nature of life on Earth to the modern anxiety over climate change. For the Victorians, he writes, it created a ‘deep and existential terror'. One thing that makes this passage so effective is that Arnold has already cast the spell in the first verse paragraph, hypnotising us with the rhythm and rhyme, and linking it to the movement of the waves. In the second paragraph, he says, ‘we find also in the sound a thought'. And then in the third paragraph, he tells us the thought. And the thought that he attaches to this movement, which we are by now emotionally invested in, is a thought of such horror and profundity – certainly for his Victorian readers – that the retreat of the sea of faith really does feel devastating. It leaves us gazing down at the naked shingles of the world. The speaker is now imaginatively out of the bedroom and down on the beach. This is very relatable; we've all stood on the beach and watched the waves withdrawing beneath our feet and the shingle being left there. It's an incredibly vivid evocation of a pretty abstract concept. Then, in the fourth and final verse paragraph, comes a bit of a surprise: Ah, love, let us be trueTo one another! Well, I for one was not expecting that! From existential despair to an appeal to his beloved. What a delightful, romantic (with a small ‘r') response to the big-picture, existential catastrophe. And for me, it's another little echo of Shakespeare's Sonnet 60, which opens with a poet contemplating the sea and the passing of time and feeling the temptation to despair, yet also ends with an appeal to the consolation of love: And yet to times in hope my verse shall stand,blockquotePraising thy worth, despite his cruel hand. Turning back to Arnold. He says ‘let us be true / To one another'. And then he links their situation to the existential catastrophe, and says this is precisely why they should be true to each other: for the world, which seemsTo lie before us like a land of dreams,So various, so beautiful, so new,Hath really neither joy, nor love, nor light,Nor certitude, nor peace, nor help for pain; It sounds, on the face of it, a pretty unlikely justification for being true to one another in a romantic sense. But actually, this is a very modern stance towards romantic love. It's like the gleam of light that just flashed across the Channel from France – the idea of you and me against an unfeeling world, of love as redemption, or at least consolation, in a meaningless universe. In a world with ‘neither joy, nor love, nor light,' our love becomes all the more poignant and important. Of course, we could easily object that, regardless of religious faith, the world does have joy and love and light. His very declaration of love is evidence of this. But let's face it, we don't always come to poets for logical consistency, do we? And we don't have to agree with Matthew Arnold to find this passage moving; most of us have felt like this at some time when we've looked at the world in what feels like the cold light of reality. He evokes it so vividly and dramatically that I, for one, am quite prepared to go with him on this. Then we get the final three lines of the poem:We are here as on a darkling plainSwept with confused alarms of struggle and flight,Where ignorant armies clash by night. I don't know about you, but I find this a little jarring in the light of what we've just heard. We've had the magnificent description of the sea and its effect on human thought, extending that into the idea of faith receding into illusion, and settling on human love as some kind of consolation for the loss of faith. So why do we need to be transported to a windswept plain where armies are clashing and struggling? It turns out to be another classical reference, to the Greek historian Thucydides' account of the night battle of Epipolae, where the two armies were running around in the dark and some of them ended up fighting their own side in the confusion. I mean, fine, he's a classical scholar. And obviously, it's deeply meaningful to him. But to me, this feels a little bit bolted on. A lot of people love that ending, but to me, it's is not as good as some of the earlier bits, or at least it doesn't quite feel all of a piece with the imagery of the sea. But overall, it is a magnificent poem, and this is a small quibble. Stepping back, I want to have another look at the poem's form, specifically the meter, and even more specifically, the irregularity of the meter, which is quite unusual and actually quite innovative for its time. As I've said, it's in iambic meter, but it's not strictly iambic pentameter. You may recall I did a mini series on the podcast a while ago looking at the evolution of blank verse, unrhymed iambic pentameter, from Christopher Marlowe and Shakespeare's dramatic verse, then Milton's Paradise Lost and finally Wordsworth's Tintern Abbey. ‘Dover Beach' is rhymed, so it's not blank verse, but most of the techniques Arnold uses here are familiar from those other poets, with variations on the basic rhythm, sometimes switching the beats around, and using enjambment and caesura (a break or pause in the middle of the line). But, and – this is quite a big but – not every line has five beats. The lines get longer and shorter in an irregular pattern, apparently according to Arnold's instinct. And this is pretty unusual, certainly for 1851. It's not unique, we could point to bits of Tennyson or Arthur Hugh Clough for metrical experiments in a similar vein, but it's certainly not common practice. And I looked into this, to see what the critics have said about it. And it turns out the scholars are divided. In one camp, the critics say that what Arnold is doing is firmly in the iambic pentameter tradition – it's just one more variation on the pattern. But in the other camp are people who say, ‘No, this is something new; this is freer verse,' and it is anticipating free verse, the non-metrical poetry with no set line lengths that came to be the dominant verse form of the 20th century. Personally, I think you can look back to Wordsworth and see a continuity with his poetic practice. But you could equally look forward, to a link with T. S. Eliot's innovations in ‘The Love Song of J. Alfred Prufrock' and The Waste Land. Eliot is often described as an innovator in free verse, which is true up to a point, but a lot of his writing in that early period isn't strictly free verse; it's a kind of broken up metrical verse, where he often uses an iambic metre with long and short lines, which he varies with great intuitive skill – in a similar manner to Arnold's ‘Dover Beach'. Interestingly, when ‘Dover Beach' was first published, the reviews didn't really talk about the metre, which is ammunition for the people who say, ‘Well, this is just a kind of iambic pentameter'. Personally, I think what we have here is something like the well-known Duck-Rabbit illusion, where you can look at the same drawing and either see a duck or a rabbit, depending how you look at it. So from one angle, ‘Dover Beach' is clearly continuing the iambic pentameter tradition; from another angle, it anticipates the innovations of free verse. We can draw a line from the regular iambic pentameter of Wordsworth (writing at the turn of the 18th and 19th century) to the fractured iambic verse of Eliot at the start of the 20th century. ‘Dover Beach' is pretty well halfway between them, historically and poetically. And I don't think this is just a dry technical development. There is something going on here in terms of the poet's sense of order and disorder, faith and doubt. Wordsworth, in the regular unfolding of his blank verse, conveys his basic trust in an ordered and meaningful universe. Matthew Arnold is writing very explicitly about the breakup of faith, and we can start to see it in the breakup of the ordered iambic pentameter. By the time we get to the existential despair of Eliot's Waste Land, the meter is really falling apart, like the Waste Land Eliot describes. So overall, I think we can appreciate what a finely balanced poem Arnold has written. It's hard to categorise. You read it the first time and think, ‘Oh, right, another conventional Victorian melancholy lament'. But just when we think he's about to go overboard with the Sea of Faith, he surprises us and with that magnificent central passage. And just as he's about to give in to despair, we get that glimmering spark of love lighting up, and we think, ‘Well, maybe this is a romantic poem after all'. And maybe Arnold might look at me over his spectacles and patiently explain that actually, this is why that final metaphor of the clashing armies is exactly right. Friend and foe are running in first one direction, then another, inadvertently killing the people on the wrong side. So the simile gives us that sense of being caught in the cross-currents of a larger sweep of history. With all of that hovering in our mind, let's go over to the window once more and heed his call to listen to the sound of the Victorian sea at Dover Beach. Dover Beach By Matthew Arnold The sea is calm tonight.The tide is full, the moon lies fairUpon the straits; on the French coast the lightGleams and is gone; the cliffs of England stand,Glimmering and vast, out in the tranquil bay.Come to the window, sweet is the night-air!Only, from the long line of sprayWhere the sea meets the moon-blanched land,Listen! you hear the grating roarOf pebbles which the waves draw back, and fling,At their return, up the high strand,Begin, and cease, and then again begin,With tremulous cadence slow, and bringThe eternal note of sadness in. Sophocles long agoHeard it on the Aegean, and it broughtInto his mind the turbid ebb and flowOf human misery; weFind also in the sound a thought,Hearing it by this distant northern sea. The Sea of FaithWas once, too, at the full, and round earth's shoreLay like the folds of a bright girdle furled.But now I only hearIts melancholy, long, withdrawing roar,Retreating, to the breathOf the night-wind, down the vast edges drearAnd naked shingles of the world. Ah, love, let us be trueTo one another! for the world, which seemsTo lie before us like a land of dreams,So various, so beautiful, so new,Hath really neither joy, nor love, nor light,Nor certitude, nor peace, nor help for pain;And we are here as on a darkling plainSwept with confused alarms of struggle and flight,Where ignorant armies clash by night. Matthew Arnold Matthew Arnold was a British poet, critic, and public intellectual who was born in 1822 and died in 1888. His father was Thomas Arnold, the famed headmaster of Rugby School. Arnold studied Classics at Oxford and first became known for lyrical, melancholic poems such as ‘Dover Beach', ‘The Scholar-Gipsy', and ‘Thyrsis', that explore the loss of faith in the modern world. Appointed an inspector of schools, he travelled widely and developed strong views on culture, education, and society. His critical essays, especially Culture and Anarchy, shaped debates about the role of culture in public life. Arnold remains a central figure bridging Romanticism and early modern thought. A Mouthful of Air – the podcast This is a transcript of an episode of A Mouthful of Air – a poetry podcast hosted by Mark McGuinness. New episodes are released every other Tuesday. You can hear every episode of the podcast via Apple, Spotify, Google Podcasts or your favourite app. You can have a full transcript of every new episode sent to you via email. The music and soundscapes for the show are created by Javier Weyler. Sound production is by Breaking Waves and visual identity by Irene Hoffman. A Mouthful of Air is produced by The 21st Century Creative, with support from Arts Council England via a National Lottery Project Grant. Listen to the show You can listen and subscribe to A Mouthful of Air on all the main podcast platforms Related Episodes Dover Beach by Matthew Arnold Episode 87 Dover Beach by Matthew Arnold Mark McGuinness reads and discusses ‘Dover Beach' by Matthew Arnold.Poet Matthew ArnoldReading and commentary by Mark McGuinnessDover Beach By Matthew Arnold The sea is calm tonight.The tide is full, the moon lies... Recalling Brigid by Orna Ross Orna Ross reads and discusses ‘Recalling Brigid’ from Poet Town. From The Rime of the Ancient Mariner by Samuel Taylor Coleridge Episode 85 From The Rime of the Ancient Mariner by Samuel Taylor Coleridge Mark McGuinness reads and discusses a passage from ‘The Rime of the Ancient Mariner' by Samuel Taylor Coleridge.Poet Samuel Taylor ColeridgeReading and commentary by Mark McGuinnessFrom...
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Welcome to AI Unraveled (December 18, 2025): Your daily strategic briefing on the business impact of artificial intelligence.In this special edition of AI Unraveled, we conduct a forensic accounting of the "Vibe Shift" hitting the tech world in Q4 2025. Based on an exhaustive analysis of thousands of developer discussions across Reddit and X, we explore the rising sentiment of "AI Fatigue."This isn't about boredom—it is a militant reaction to technical regression. We break down the "Hype-Utility Gap," the widespread complaints about Gemini 3.0 and Claude 4.5 becoming "lazy" (refusing to write code), and the invasive rise of corporate AI surveillance. We also discuss the "VRAM Crisis" creating a class divide among developers and the pragmatic roadmap for surviving the "Slop Era."Key Topics:The Executive Summary: Why the "Generative Boom" has turned into the "Trough of Exhaustion."Frustration A: The "Hype-Utility Gap"—Why developers are tired of being told tools will replace CEOs when they can't even retain file context.Frustration B: The Degradation Crisis—Why Gemini 3.0 and Claude 4.5 are giving you placeholder comments (// rest of code here) instead of solutions.The Surveillance State: How companies are weaponizing "prompt logs" and "commit velocity" to spy on workers.The VRAM Class Divide: The "Memory Vacuum" that separates the haves from the have-nots.[16:30] The Take: Moving to the "Cyborg Workflow"—Treating AI like a junior intern, not a god.Keywords: AI Fatigue, Gemini 3.0 Pro, Claude 4.5 Sonnet, Model Collapse, AI Slop, Developer Burnout, Hype-Utility Gap, VRAM Crisis, AI Surveillance, Prompt Engineering, Q4 2025 AI Trends, Etienne NoumenHost Connection & Engagement:Connect with Etienne: https://www.linkedin.com/in/enoumen/Advertise on AI Unraveled and reach C-Suite Executives directly: Secure Your Mid-Roll Spot here: https://forms.gle/Yqk7nBtAQYKtryvM6
durée : 00:34:02 - Les Nuits de France Culture - par : Christine Goémé - Par André Velter - Avec Jacques Roubaud (poète, écrivain et mathématicien) - Avec des lectures de poèmes de Jacques Roubaud - réalisation : Virginie Mourthé
In this episode, Aydin sits down with Paul Xue, a self-described “vibe marketer” and former 3x CTO who now runs an AI-native Reddit growth agency. Paul explains why he believes any assumption you made about AI even three months ago is probably wrong today, and how that realization pushed him to pivot away from writing code as a long-term career.He walks through how his team ships production software where ~100% of the code is AI-generated, why 80% of the work now lives in planning and system design, and how new models like Claude Opus 4.5 and Gemini 3 let him literally “go for a walk” while his tools implement features. Along the way, Paul shares real numbers (two years of work vs 10–15 hours), what this means for agencies and devs, how he hires in an AI-native world, and gives a behind-the-scenes tour of the multi-agent workflows powering his Reddit content engine.Timestamps0:00 – Introduction1:01 – What a “vibe marketer” is and why Reddit is a power channel in the LLM era3:01 – From 3x CTO to Reddit-first entrepreneur: deciding coding isn't future-proof4:06 – GPT-3.5 + end of zero interest rates: when dev agency contracts fell off a cliff6:28 – Adoption curves: senior devs who still don't use AI and why personality matters7:57 – Running an AI-native shop where ~100% of production code is AI-generated9:48 – Two years vs 10–15 hours: Paul's personal 10x story on shipping an MVP12:04 – New development workflow: “plan mode” and spending 80% of time on specs18:17 – Claude Opus 4.5, Gemini 3, and “going for a walk” while AI finishes features23:30 – How $60K–$250K apps turn into weekend side projects with vibe coding tools27:12 – Hiring in the AI era: why pure “ticket-taking” devs won't survive35:12 – Inside an AI-native Reddit engine: n8n workflows, agents, Pinecone & OpenRouterTools & Technologies MentionedReddit – Primary growth and content channel; a highly trusted source for LLM training and citations.ChatGPT / GPT-3.5 – Early model that triggered Paul's realization that traditional coding careers would change.Claude 3.5 Sonnet & Claude 3.5 Opus / Opus 4.5 – Anthropic models Paul uses for long-running coding, planning, and browser automation.Gemini 3 – Google model Paul uses to quickly generate solid, familiar SaaS-style UI/UX ideas.Cursor – AI-native code editor that turns detailed “plans” into production code with one click.n8n – Automation platform that powers Paul's multi-step AI workflows for content creation and evaluation.Pinecone – Vector database storing each client's knowledge base for highly relevant Reddit responses.OpenRouter – Routing layer that lets Paul easily swap and test different language models over time.MCP (Model Context Protocol) – Framework he uses to give agents tool access (e.g., scraping Reddit, reading DBs).Notion – Fast prototyping environment to validate data models and workflows before writing custom code.Zapier – General automation glue in the earliest workflow experiments.Figma – Design tool, now increasingly AI-assisted, for UI/UX mockups.SpecCode – Tool Paul cites for vibe coding HIPAA-compliant applications.Anything – Mobile-focused “vibe coding” platform for building iOS/Android apps on your phone.Fellow – AI meeting assistant that joins meetings, produces summaries/action items, and acts as an AI chief of staff.Subscribe at thisnewway.com to get the step-by-step playbooks, tools, and workflows.
In this episode, Phillis Levin reads "An Anthology of Rain," the title poem of her newest poetry collection. She guides us through the philosophical underpinnings of her poem, how it informs the book as a whole, and how the surfaces of things can tell us so much about their substance. Phillis Levin is the author of six poetry collections, including An Anthology of Rain (https://barrowstreet.org/press/product/an-anthology-of-rain-phillis-levin/). She is also the editor of The Penguin Book of the Sonnet: 500 Years of a Classic Tradition in English (https://www.penguinrandomhouse.com/books/333350/the-penguin-book-of-the-sonnet-by-various/). Levin's honors include a Fulbright Scholar Award to Slovenia, an Ingram Merrill Grant, the Richard Hugo Prize from Poetry Northwest, and fellowships from the Guggenheim Foundation, the National Endowment for the Arts, and the Trust of Amy Lowell. To learn more about Phillis and her work, please visit her website. https://phillislevin.com Photo credit: Sigrid Estrada
M
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
In this, the – at least for the time-being – final episode of his podcast on William Shakespeare's Sonnets, Sebastian Michael offers a brief summary of his findings and also takes the opportunity to examine in a little more detail the view held by some contemporary editors that the sonnets may not principally be about a Fair Youth and a Dark Lady – a contention that is largely based on a supposition that many of these poems "could be addressed to a male or a female" as one recent edition puts it – to see whether it stands up to scrutiny as we formulate any conclusion.
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
In this special episode, Sebastian Michael is joined by architect, author, and coder Miro Roman to talk about their experimentation with applying a machine learning approach to comparing the full text of William Shakespeare's Sonnets to the full text of his plays and narrative poems to examine whether such a methodology can confirm the rare word analysis research that has previously been carried out by Macdonald P Jackson and others towards dating the sonnets.
Dans cet épisode, Emmanuel, Katia et Guillaume discutent de Spring 7, Quarkus, d'Infinispan et Keycloak. On discute aussi de projets sympas comme Javelit, de comment démarre une JVM, du besoin d'argent de NTP. Et puis on discute du changement de carrière d'Emmanuel. Enregistré le 14 novembre 2025 Téléchargement de l'épisode LesCastCodeurs-Episode-332.mp3 ou en vidéo sur YouTube. News Emmanuel quitte Red Hat après 20 ans https://emmanuelbernard.com/blog/2025/11/13/leaving-redhat/ Langages Support HTTP/3 dans le HttpClient de JDK 26 - https://inside.java/2025/10/22/http3-support/ JDK 26 introduit le support de HTTP/3 dans l'API HttpClient existante depuis Java 11 HTTP/3 utilise le protocole QUIC sur UDP au lieu de TCP utilisé par HTTP/2 Par défaut HttpClient préfère HTTP/2, il faut explicitement configurer HTTP/3 avec Version.HTTP_3 Le client effectue automatiquement un downgrade vers HTTP/2 puis HTTP/1.1 si le serveur ne supporte pas HTTP/3 On peut forcer l'utilisation exclusive de HTTP/3 avec l'option H3_DISCOVERY en mode HTTP_3_URI_ONLY HttpClient apprend qu'un serveur supporte HTTP/3 via le header alt-svc (RFC 7838) et utilise cette info pour les requêtes suivantes La première requête peut utiliser HTTP/2 même avec HTTP/3 préféré, mais la seconde utilisera HTTP/3 si le serveur l'annonce L'équipe OpenJDK encourage les tests et retours d'expérience sur les builds early access de JDK 26 Librairies Eclispe Jetty et CometD changent leurs stratégie de support https://webtide.com/end-of-life-changes-to-eclipse-jetty-and-cometd/ À partir du 1er janvier 2026, Webtide ne publiera plus Jetty 9/10/11 et CometD 5/6/7 sur Maven Central Pendant 20 ans, Webtide a financé les projets Jetty et CometD via services et support, publiant gratuitement les mises à jour EOL Le comportement des entreprises a changé : beaucoup cherchent juste du gratuit plutôt que du véritable support Des sociétés utilisent des versions de plus de 10 ans sans migrer tant que les correctifs CVE sont gratuits Cette politique gratuite a involontairement encouragé la complaisance et retardé les migrations vers versions récentes MITRE développe des changements au système CVE pour mieux gérer les concepts d'EOL Webtide lance un programme de partenariat avec TuxCare et HeroDevs pour distribuer les résolutions CVE des versions EOL Les binaires EOL seront désormais distribués uniquement aux clients commerciaux et via le réseau de partenaires Webtide continue le support standard open-source : quand Jetty 13 sortira, Jetty 12.1 recevra des mises à jour pendant 6 mois à un an Ce changement vise à clarifier la politique EOL avec une terminologie industrielle établie Améliorations cloud du SDK A2A Java https://quarkus.io/blog/quarkus-a2a-cloud-enhancements/ Version 0.3.0.Final du SDK A2A Java apporte des améliorations pour les environnements cloud et distribués Composants en mémoire remplacés par des implémentations persistantes et répliquées pour environnements multi-instances JpaDatabaseTaskStore et JpaDatabasePushNotificationConfigStore permettent la persistance des tâches et configurations en base PostgreSQL ReplicatedQueueManager assure la réplication des événements entre instances A2A Agent via Kafka et MicroProfile Reactive Messaging Exemple complet de déploiement Kubernetes avec Kind incluant PostgreSQL, Kafka via Strimzi, et load balancing entre pods Démonstration pratique montrant que les messages peuvent être traités par différents pods tout en maintenant la cohérence des tâches Architecture inspirée du SDK Python A2A, permettant la gestion de tâches asynchrones longues durée en environnement distribué Quarkus 3.29 sort avec des backends de cache multiples et support du débogueur Qute https://quarkus.io/blog/quarkus-3-29-released/ Possibilité d'utiliser plusieurs backends de cache simultanément dans une même application Chaque cache peut être associé à un backend spécifique (par exemple Caffeine et Redis ou Infinispan) Support du Debug Adapter Protocol (DAP) pour déboguer les templates Qute directement dans l'IDE et dans la version 3.28 Configuration programmatique de la protection CSRF via une API fluent Possibilité de restreindre les filtres OIDC à des flux d'authentification spécifiques avec annotations Support des dashboards Grafana personnalisés via fichiers JSON dans META-INF/grafana/ Extension Liquibase MongoDB supporte désormais plusieurs clients simultanés Amélioration significative des performances de build avec réduction des allocations mémoire Parallélisation de tâches comme la génération de proxies Hibernate ORM et la construction des Jar Et l'utilisation des fichiers .proto est plus simple dans Quarkus avbec Quarkus gRPC Zero https://quarkus.io/blog/grpc-zero/ c'est toujours galere des fichiers .proto car les generateurs demandent des executables natifs maintenant ils sont bundlés dans la JVM et vous n'avez rien a configurer cela utilise Caffeine pour faire tourner cela en WASM dans la JVM Spring AI 1.1 est presque là https://spring.io/blog/2025/11/08/spring-ai-1-1-0-RC1-available-now support des MCP tool caching pour les callback qui reduit les iooerations redondantes Access au contenu de raisonnement OpenAI Un modele de Chat MongoDB Support du modele de penser Ollama Reessaye sur les echec de reseau OpenAI speech to text Spring gRPC Les prochaines étapes pour la 1.0.0 https://spring.io/blog/2025/11/05/spring-grpc-next-steps Spring gRPC 1.0 arrive prochainement avec support de Spring Boot 4 L'intégration dans Spring Boot 4.0 est reportée, prévue pour Spring Boot 4.1 Les coordonnées Maven restent sous org.springframework.grpc pour la version 1.0 Le jar spring-grpc-test est renommé en spring-grpc-test-spring-boot-autoconfigure Les packages d'autoconfiguration changent de nom nécessitant de modifier les imports Les dépendances d'autoconfiguration seront immédiatement dépréciées après la release 1.0 Migration minimale attendue pour les projets utilisant déjà la version 0.x La version 1.0.0-RC1 sera publiée dès que possible avant la version finale Spring arrete le support reactif d'Apache Pulsar https://spring.io/blog/2025/10/29/spring-pulsar-reactive-discontinued logique d'évaluer le temps passé vs le nombre d'utilisateurs c'est cependant une tendance qu'on a vu s'accélerer Spring 7 est sorti https://spring.io/blog/2025/11/13/spring-framework-7-0-general-availability Infrastructure Infinispan 16.0 https://infinispan.org/blog/2025/11/10/infinispan-16-0 Ajout majeur : migration en ligne sans interruption pour les nœuds d'un cluster (rolling upgrades) (infinispan.org) Messages de clustering refaits avec Protocol Buffers + ProtoStream : meilleure compatibilité, schéma évolutif garanti (infinispan.org) Console Web améliorée API dédiée de gestion des schémas (SchemasAdmin) pour gérer les schémas ProtoStream à distance (infinispan.org) Module de requête (query) optimisé : support complet des agrégations (sum, avg …) dans les requêtes indexées en cluster grâce à l'intégration de Hibernate Search 8.1 (infinispan.org) Serveur : image conteneur minimalisée pour réduire la surface d'attaque (infinispan.org) démarrage plus rapide grâce à séparation du démarrage cache/serveur (infinispan.org) caches pour connecteurs (Memcached, RESP) créés à la demande (on-demand) et non à l'initiaton automatique (infinispan.org) moteur Lua 5.1 mis à jour avec corrections de vulnérabilités et opérations dangereuses désactivées (infinispan.org) Support JDK : version minimale toujours JDK 17 (infinispan.org) prise en charge des threads virtuels (virtual threads) et des fonctionnalités AOT (Ahead-of-Time) de JDK plus récentes (infinispan.org) Web Javelit, une nouvelle librairie Java inspirée de Streamlit pour faire facilement et rapidement des petites interfaces web https://glaforge.dev/posts/2025/10/24/javelit-to-create-quick-interactive-app-frontends-in-java/ Site web du projet : https://javelit.io/ Javelit : outil pour créer rapidement des applications de données (mais pas que) en Java. Simplifie le développement : élimine les tracas du frontend et de la gestion des événements. Transforme une classe Java en application web en quelques minutes. Inspiré par la simplicité de Streamlit de l'écosystème Python (ou Gradio et Mesop), mais pour Java. Développement axé sur la logique : pas de code standard répétitif (boilerplate), rechargement à chaud. Interactions faciles : les widgets retournent directement leur valeur, sans besoin de HTML/CSS/JS ou gestion d'événements. Déploiement flexible : applications autonomes ou intégrables dans des frameworks Java (Spring, Quarkus, etc.). L'article de Guillaume montre comment créer une petite interface pour créer et modifier des images avec le modèle génératif Nano Banana Un deuxième article montre comment utiliser Javelit pour créer une interface de chat avec LangChain4j https://glaforge.dev/posts/2025/10/25/creating-a-javelit-chat-interface-for-langchain4j/ Améliorer l'accessibilité avec les applis JetPack Compose https://blog.ippon.fr/2025/10/29/rendre-son-application-accessible-avec-jetpack-compose/ TalkBack est le lecteur d'écran Android qui vocalise les éléments sélectionnés pour les personnes malvoyantes Accessibility Scanner et les outils Android Studio détectent automatiquement les problèmes d'accessibilité statiques Les images fonctionnelles doivent avoir un contentDescription, les images décoratives contentDescription null Le contraste minimum requis est de 4.5:1 pour le texte normal et 3:1 pour le texte large ou les icônes Les zones cliquables doivent mesurer au minimum 48dp x 48dp pour faciliter l'interaction Les formulaires nécessitent des labels visibles permanents et non de simples placeholders qui disparaissent Modifier.semantics permet de définir l'arbre sémantique lu par les lecteurs d'écran Les propriétés mergeDescendants et traversalIndex contrôlent l'ordre et le regroupement de la lecture Diriger le navigateur Chrome avec le modèle Gemini Computer Use https://glaforge.dev/posts/2025/11/03/driving-a-web-browser-with-gemini-computer-use-model-in-java/ Objectif : Automatiser la navigation web en Java avec le modèle "Computer Use" de Gemini 2.5 Pro. Modèle "Computer Use" : Gemini analyse des captures d'écran et génère des actions d'interface (clic, saisie, etc.). Outils : Gemini API, Java, Playwright (pour l'interaction navigateur). Fonctionnement : Boucle agent où Gemini reçoit une capture, propose une action, Playwright l'exécute, puis une nouvelle capture est envoyée à Gemini. Implémentation clé : Toujours envoyer une capture d'écran à Gemini après chaque action pour qu'il comprenne l'état actuel. Défis : Lenteur, gestion des CAPTCHA et pop-ups (gérables). Potentiel : Automatisation des tâches web répétitives, création d'agents autonomes. Data et Intelligence Artificielle Apicurio ajoute le support de nouveaux schema sans reconstruire Apicurio https://www.apicur.io/blog/2025/10/27/custom-artifact-types Apicurio Registry 3.1.0 permet d'ajouter des types d'artefacts personnalisés au moment du déploiement sans recompiler le projet Supporte nativement OpenAPI, AsyncAPI, Avro, JSON Schema, Protobuf, GraphQL, WSDL et XSD Trois approches d'implémentation disponibles : classes Java pour la performance maximale, JavaScript/TypeScript pour la facilité de développement, ou webhooks pour une flexibilité totale Configuration via un simple fichier JSON pointant vers les implémentations des composants personnalisés Les scripts JavaScript sont exécutés via QuickJS dans un environnement sandboxé sécurisé Un package npm TypeScript fournit l'autocomplétion et la sécurité de type pour le développement Six composants optionnels configurables : détection automatique de type, validation, vérification de compatibilité, canonicalisation, déréférencement et recherche de références Cas d'usage typiques : formats propriétaires internes, support RAML, formats legacy comme WADL, schémas spécifiques à un domaine métier Déploiement simple via Docker en montant les fichiers de configuration et scripts comme volumes Les performances varient selon l'approche : Java offre les meilleures performances, JavaScript un bon équilibre, webhooks la flexibilité maximale Le truc interessant c'est que c'est Quarkus based et donc demandait le rebuilt donc pour eviter cela, ils ont ajouter QuickJS via Chicorey un moteur WebAssembly GPT 5.1 pour les développeurs est sorti. https://openai.com/index/gpt-5-1-for-developers/ C'est le meilleur puisque c'est le dernier :slightly_smiling_face: Raisonnement Adaptatif et Efficace : GPT-5.1 ajuste dynamiquement son temps de réflexion en fonction de la complexité de la tâche, le rendant nettement plus rapide et plus économique en jetons pour les tâches simples, tout en maintenant des performances de pointe sur les tâches difficiles. Nouveau Mode « Sans Raisonnement » : Un mode (reasoning_effort='none') a été introduit pour les cas d'utilisation sensibles à la latence, permettant une réponse plus rapide avec une intelligence élevée et une meilleure exécution des outils. Cache de Prompt Étendu : La mise en cache des invites est étendue jusqu'à 24 heures (contre quelques minutes auparavant), ce qui réduit la latence et le coût pour les interactions de longue durée (chats multi-tours, sessions de codage). Les jetons mis en cache sont 90 % moins chers. Améliorations en Codage : Le modèle offre une meilleure personnalité de codage, une qualité de code améliorée et de meilleures performances sur les tâches d'agenticité de code, atteignant 76,3 % sur SWE-bench Verified. Nouveaux Outils pour les Développeurs : Deux nouveaux outils sont introduits ( https://cookbook.openai.com/examples/build_a_coding_agent_with_gpt-5.1 ) : L'outil apply_patch pour des modifications de code plus fiables via des diffs structurés. L'outil shell qui permet au modèle de proposer et d'exécuter des commandes shell sur une machine locale, facilitant les boucles d'inspection et d'exécution. Disponibilité : GPT-5.1 (ainsi que les modèles gpt-5.1-codex) est disponible pour les développeurs sur toutes les plateformes API payantes, avec les mêmes tarifs et limites de débit que GPT-5. Comparaison de similarité d'articles et de documents avec les embedding models https://glaforge.dev/posts/2025/11/12/finding-related-articles-with-vector-embedding-models/ Principe : Convertir les articles en vecteurs numériques ; la similarité sémantique est mesurée par la proximité de ces vecteurs. Démarche : Résumé des articles via Gemini-2.5-flash. Conversion des résumés en vecteurs (embeddings) par Gemini-embedding-001. Calcul de la similarité entre vecteurs par similarité cosinus. Affichage des 3 articles les plus pertinents (>0.75) dans le frontmatter Hugo. Bilan : Approche "résumé et embedding" efficace, pragmatique et améliorant l'engagement des lecteurs. Outillage Composer : Nouveau modèle d'agent rapide pour l'ingénierie logicielle - https://cursor.com/blog/composer Composer est un modèle d'agent conçu pour l'ingénierie logicielle qui génère du code quatre fois plus rapidement que les modèles similaires Le modèle est entraîné sur de vrais défis d'ingénierie logicielle dans de grandes bases de code avec accès à des outils de recherche et d'édition Il s'agit d'un modèle de type mixture-of-experts optimisé pour des réponses interactives et rapides afin de maintenir le flux de développement L'entraînement utilise l'apprentissage par renforcement dans divers environnements de développement avec des outils comme la lecture de fichiers, l'édition, les commandes terminal et la recherche sémantique Cursor Bench est un benchmark d'évaluation basé sur de vraies demandes d'ingénieurs qui mesure la correction et le respect des abstractions du code existant Le modèle apprend automatiquement des comportements utiles comme effectuer des recherches complexes, corriger les erreurs de linter et écrire des tests unitaires L'infrastructure d'entraînement utilise PyTorch et Ray avec des kernels MXFP8 pour entraîner sur des milliers de GPUs NVIDIA Le système exécute des centaines de milliers d'environnements de codage sandboxés concurrents dans le cloud pour l'entraînement Composer est déjà utilisé quotidiennement par les développeurs de Cursor pour leur propre travail Le modèle se positionne juste derrière GPT-5 et Sonnet 4.5 en termes de performance sur les benchmarks internes Rex sur l'utilisation de l'IA pour les développeurs, un gain de productivité réel et des contextes adaptés https://mcorbin.fr/posts/2025-10-17-genai-dev/ Un développeur avec 18 ans d'expérience partage son retour sur l'IA générative après avoir changé d'avis Utilise exclusivement Claude Code dans le terminal pour coder en langage naturel Le "vibe coding" permet de générer des scripts et interfaces sans regarder le code généré Génération rapide de scripts Python pour traiter des CSV, JSON ou créer des interfaces HTML Le mode chirurgien résout des bugs complexes en one-shot, exemple avec un plugin Grafana fixé en une minute Pour le code de production, l'IA génère les couches repository, service et API de manière itérative, mais le dev controle le modele de données Le développeur relit toujours le code et ajuste manuellement ou via l'IA selon le besoin L'IA ne remplacera pas les développeurs car la réflexion, conception et expertise technique restent essentielles La construction de produits robustes, scalables et maintenables nécessite une expérience humaine L'IA libère du temps sur les tâches répétitives et permet de se concentrer sur les aspects complexes ce que je trouve interessant c'est la partie sur le code de prod effectivement, je corrige aussi beaucoup les propositions de l'IA en lui demandant de faire mieux dans tel ou tel domaine Sans guide, tout cela serait perdu Affaire a suivre un article en parallele sur le métier de designer https://blog.ippon.fr/2025/11/03/lia-ne-remplace-pas-un-designer-elle-amplifie-la-difference-entre-faire-et-bien-faire/ Plus besoin de se rappeler les racourcis dans IntelliJ idea avec l'universal entry point https://blog.jetbrains.com/idea/2025/11/universal-entry-point-a-single-entry-point-for-context-aware-coding-assistance/ IntelliJ IDEA introduit Command Completion, une nouvelle façon d'accéder aux actions de l'IDE directement depuis l'éditeur Fonctionne comme la complétion de code : tapez point (.) pour voir les actions contextuelles disponibles Tapez double point (..) pour filtrer et n'afficher que les actions disponibles Propose des corrections, refactorings, génération de code et navigation selon le contexte Complète les fonctionnalités existantes sans les remplacer : raccourcis, Alt+Enter, Search Everywhere Facilite la découverte des fonctionnalités de l'IDE sans interrompre le flux de développement En Beta dans la version 2025.2, sera activé par défaut dans 2025.3 Support actuel pour Java et Kotlin, avec actions spécifiques aux frameworks comme Spring et Hibernate Homebrew, package manage pour macOS et Linux passe en version 5 https://brew.sh/2025/11/12/homebrew-5.0.0/ Téléchargements Parallèles par Défaut : Le paramètre HOMEBREW_DOWNLOAD_CONCURRENCY=auto est activé par défaut, permettant des téléchargements concurrents pour tous les utilisateurs, avec un rapport de progression. Support Linux ARM64/AArch64 en Tier 1 : Le support pour Linux ARM64/AArch64 a été promu au niveau "Tier 1" (support officiel de premier plan). Feuille de Route pour les Dépréciations macOS : Septembre 2026 (ou plus tard) : Homebrew ne fonctionnera plus sur macOS Catalina (10.15) et versions antérieures. macOS Intel (x86_64) passera en "Tier 3" (fin du support CI et des binaires précompilés/bottles). Septembre 2027 (ou plus tard) : Homebrew ne fonctionnera plus sur macOS Big Sur (11) sur Apple Silicon ni du tout sur Intel (x86_64). Sécurité et Casks : Dépréciation des Casks sans signature de code. Désactivation des Casks échouant aux vérifications Gatekeeper en septembre 2026. Les options --no-quarantine et --quarantine sont dépréciés pour ne plus faciliter le contournement des fonctionnalités de sécurité de macOS. Nouvelles Fonctionnalités & Améliorations : Support officiel pour macOS 26 (Tahoe). brew bundle supporte désormais l'installation de packages Go via un Brewfile. Ajout de la commande brew info --sizes pour afficher la taille des formulae et casks. La commande brew search --alpine permet de chercher des packages Alpine Linux. Architecture Selon l'analyste RedMonk, Java reste très pertinent dans l'aire de l'IA et des agents https://redmonk.com/jgovernor/java-relevance-in-the-ai-era-agent-frameworks-emerge/ Java reste pertinent à l'ère de l'IA, pas besoin d'apprendre une pile technique entièrement nouvelle. Capacité d'adaptation de Java ("anticorps") aux innovations (Big Data, cloud, IA), le rendant idéal pour les contextes d'entreprise. L'écosystème JVM offre des avantages sur Python pour la logique métier et les applications sophistiquées, notamment en termes de sécurité et d'évolutivité. Embabel (par Rod Johnson, créateur de Spring) : un framework d'agents fortement typé pour JVM, visant le déterminisme des projets avant la génération de code par LLM. LangChain4J : facilite l'accès aux capacités d'IA pour les développeurs Java, s'aligne sur les modèles d'entreprise établis et permet aux LLM d'appeler des méthodes Java. Koog (Jetbrains) : framework d'agents basé sur Kotlin, typé et spécifique aux développeurs JVM/Kotlin. Akka : a pivoté pour se concentrer sur les flux de travail d'agents IA, abordant la complexité, la confiance et les coûts des agents dans les systèmes distribués. Le Model Context Protocol (MCP) est jugé insuffisant, manquant d'explicabilité, de découvrabilité, de capacité à mélanger les modèles, de garde-fous, de gestion de flux, de composabilité et d'intégration sécurisée. Les développeurs Java sont bien placés pour construire des applications compatibles IA et intégrer des agents. Des acteurs majeurs comme IBM, Red Hat et Oracle continuent d'investir massivement dans Java et son intégration avec l'IA. Sécurité AI Deepfake, Hiring … A danger réel https://www.eu-startups.com/2025/10/european-startups-get-serious-about-deepfakes-as-ai-fraud-losses-surpass-e1-3-billion/ Pertes liées aux deepfakes en Europe : > 1,3 milliard € (860 M € rien qu'en 2025). Création de deepfakes désormais possible pour quelques euros. Fraudes : faux entretiens vidéo, usurpations d'identité, arnaques diverses. Startups actives : Acoru, IdentifAI, Trustfull, Innerworks, Keyless (détection et prévention). Réglementation : AI Act et Digital Services Act imposent transparence et contrôle. Recommandations : vérifier identités, former employés, adopter authentification multi-facteurs. En lien : https://www.techmonitor.ai/technology/cybersecurity/remote-hiring-cybersecurity 1 Candidat sur 4 sera Fake en 2028 selon Gartner research https://www.gartner.com/en/newsroom/press-releases/2025-07-31-gartner-survey-shows-j[…]-percent-of-job-applicants-trust-ai-will-fairly-evaluate-them Loi, société et organisation Amazon - prévoit supprimer 30.000 postes https://www.20minutes.fr/economie/4181936-20251028-amazon-prevoit-supprimer-30-000-emplois-bureau-selon-plusieurs-medias Postes supprimés : 30 000 bureaux Part des effectifs : ~10 % des employés corporatifs Tranche confirmée : 14 000 postes Divisions touchées : RH, Opérations, Devices & Services, Cloud Motifs : sur-recrutement, bureaucratie, automatisation/IA Accompagnement : 90 jours pour poste interne + aides Non concernés : entrepôts/logistique Objectif : concentrer sur priorités stratégiques NTP a besoin d'argent https://www.ntp.org/ Il n'est que le protocole qui synchronise toutes les machines du monde La fondation https://www.nwtime.org/ recherche 11000$ pour maintenir son activité Rubrique débutant Une plongée approfondie dans le démarrage de la JVM https://inside.java/2025/01/28/jvm-start-up La JVM effectue une initialisation complexe avant d'exécuter le code : validation des arguments, détection des ressources système et sélection du garbage collector approprié Le chargement de classes suit une stratégie lazy où chaque classe charge d'abord ses dépendances dans l'ordre de déclaration, créant une chaîne d'environ 450 classes même pour un simple Hello World La liaison de classes comprend trois sous-processus : vérification de la structure, préparation avec initialisation des champs statiques à leurs valeurs par défaut, et résolution des références symboliques du Constant Pool Le CDS améliore les performances au démarrage en fournissant des classes pré-vérifiées, réduisant le travail de la JVM L'initialisation de classe exécute les initialiseurs statiques via la méthode spéciale clinit générée automatiquement par javac Le Project Leyden introduit la compilation AOT dans JDK 24 pour réduire le temps de démarrage en effectuant le chargement et la liaison de classes en avance de phase Pas si débutant finalement Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 12-14 novembre 2025 : Devoxx Morocco - Marrakech (Morocco) 15-16 novembre 2025 : Capitole du Libre - Toulouse (France) 19 novembre 2025 : SREday Paris 2025 Q4 - Paris (France) 19-21 novembre 2025 : Agile Grenoble - Grenoble (France) 20 novembre 2025 : OVHcloud Summit - Paris (France) 21 novembre 2025 : DevFest Paris 2025 - Paris (France) 24 novembre 2025 : Forward Data & AI Conference - Paris (France) 27 novembre 2025 : DevFest Strasbourg 2025 - Strasbourg (France) 28 novembre 2025 : DevFest Lyon - Lyon (France) 1-2 décembre 2025 : Tech Rocks Summit 2025 - Paris (France) 4-5 décembre 2025 : Agile Tour Rennes - Rennes (France) 5 décembre 2025 : DevFest Dijon 2025 - Dijon (France) 9-11 décembre 2025 : APIdays Paris - Paris (France) 9-11 décembre 2025 : Green IO Paris - Paris (France) 10-11 décembre 2025 : Devops REX - Paris (France) 10-11 décembre 2025 : Open Source Experience - Paris (France) 11 décembre 2025 : Normandie.ai 2025 - Rouen (France) 14-17 janvier 2026 : SnowCamp 2026 - Grenoble (France) 22 janvier 2026 : DevCon #26 : sécurité / post-quantique / hacking - Paris (France) 29-31 janvier 2026 : Epitech Summit 2026 - Paris - Paris (France) 2-5 février 2026 : Epitech Summit 2026 - Moulins - Moulins (France) 2-6 février 2026 : Web Days Convention - Aix-en-Provence (France) 3 février 2026 : Cloud Native Days France 2026 - Paris (France) 3-4 février 2026 : Epitech Summit 2026 - Lille - Lille (France) 3-4 février 2026 : Epitech Summit 2026 - Mulhouse - Mulhouse (France) 3-4 février 2026 : Epitech Summit 2026 - Nancy - Nancy (France) 3-4 février 2026 : Epitech Summit 2026 - Nantes - Nantes (France) 3-4 février 2026 : Epitech Summit 2026 - Marseille - Marseille (France) 3-4 février 2026 : Epitech Summit 2026 - Rennes - Rennes (France) 3-4 février 2026 : Epitech Summit 2026 - Montpellier - Montpellier (France) 3-4 février 2026 : Epitech Summit 2026 - Strasbourg - Strasbourg (France) 3-4 février 2026 : Epitech Summit 2026 - Toulouse - Toulouse (France) 4-5 février 2026 : Epitech Summit 2026 - Bordeaux - Bordeaux (France) 4-5 février 2026 : Epitech Summit 2026 - Lyon - Lyon (France) 4-6 février 2026 : Epitech Summit 2026 - Nice - Nice (France) 12-13 février 2026 : Touraine Tech #26 - Tours (France) 26-27 mars 2026 : SymfonyLive Paris 2026 - Paris (France) 27-29 mars 2026 : Shift - Nantes (France) 31 mars 2026 : ParisTestConf - Paris (France) 16-17 avril 2026 : MiXiT 2026 - Lyon (France) 22-24 avril 2026 : Devoxx France 2026 - Paris (France) 23-25 avril 2026 : Devoxx Greece - Athens (Greece) 6-7 mai 2026 : Devoxx UK 2026 - London (UK) 22 mai 2026 : AFUP Day 2026 Lille - Lille (France) 22 mai 2026 : AFUP Day 2026 Paris - Paris (France) 22 mai 2026 : AFUP Day 2026 Bordeaux - Bordeaux (France) 22 mai 2026 : AFUP Day 2026 Lyon - Lyon (France) 17 juin 2026 : Devoxx Poland - Krakow (Poland) 11-12 juillet 2026 : DevLille 2026 - Lille (France) 4 septembre 2026 : JUG Summer Camp 2026 - La Rochelle (France) 17-18 septembre 2026 : API Platform Conference 2026 - Lille (France) 5-9 octobre 2026 : Devoxx Belgium - Antwerp (Belgium) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
In this special edition of Sonnetcast, Sebastian Michael takes a closer look at our original source for William Shakespeare's Sonnets, examining in detail the few textual issues it presents, the much debated origin of the manuscript, and the dedication by its publisher Thomas Thorpe to a "Mr. W. H." which has been puzzling readers of these poems for centuries.
Pulitzer Prize-winning poet Paul Muldoon is considered “the most significant English-language poet born since the second World War.”His most recent book, “Scanty Plot of Ground: A Book of Sonnets,” is a new anthology of beloved classics, hidden treasures and standout contemporary examples of this ever-vital and enthralling verse form. His latest poetry collection, “Joy in Service on Rue Tagore,” is now out in paperback.
Recorded by Chris Watkins for Poem-a-Day, a series produced by the Academy of American Poets. Published on November 14, 2025. www.poets.org
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Today on the AI Daily Brief, NLW explores the rise of Kimi K2 Thinking, a new open-source model from China that's outperforming GPT-5 and Claude 4.5 Sonnet on agentic benchmarks—and doing it at a fraction of the cost. We'll look at how this shift is changing the balance of power between closed and open models, why Silicon Valley startups are already adopting Chinese systems, and what it means for the next phase of the AI race. Plus: Meta's new speech model, DeepSeek's dire job-market warning, and CoreWeave's data-center delays.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - https://rovo.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
Following an established tradition at the time, William Shakespeare furnishes his collection of 154 Sonnets with a poetic Complaint that acts as a coda to the series. Where he departs from common practice is in his deployment of an elaborate but for this no less effective distancing device that has an unnamed narrator set up the scene for us and then allows us to listen in on a conversation between a young woman and an elderly gentleman, as the woman relates how she long held out against the charms and wiles of a beautiful suitor but ultimately fell for him. Also highly unconventional and genuinely surprising is the conclusion the young woman comes to at the end of her lament...
Rufus Wainwright is a singer-songwriter and composer renowned for his distinctive voice and the theatricality of his performances. Born into a family of folk musicians, his mother was Kate McGarrigle and his father is the songwriter Loudon Wainwright III. Since his debut in 1998, his 11 studio albums have been characterised by their candid autobiographical themes, with songs about addiction, sexuality and fraught family dynamics. He has also worked as a classical composer, with his operas Prima Donna and Hadrian, and a choral piece called Dream Requiem. As a performer he has created musical tributes to Judy Garland, Shakespeare's Sonnets, the songs of Kurt Weill, and most recently has staged symphonic versions of his much-loved Want albums.Rufus Wainwright tells John Wilson about his earliest musical experiences, singing with his mother and aunties in Montreal, Canada where he spent his early years. He chooses The Wizard Of Oz as one of his formative creative influences and explains why the film's star, Judy Garland, became such an important musical role model for him. Rufus reveals how hearing Verdi's Requiem at the age of 13 led to a lifelong love of opera and an aspiration to write classical compositions. He also recalls the impact that seeing La Dolce Vita, director Federico Fellini's masterpiece about wealth and decadence in 1960s Rome, had on him as a teenager. Producer: Edwina Pitman
Today's poem is a “row of perfect rhymes” and an absolute delight. Happy reading.You can find the text of the poem here.George Starbuck was born in Columbus, Ohio on June 15, 1931. He grew up in Illinois and California. He attended the University of California at Berkeley for two years, and the University of Chicago for three. He then studied with Archibald MacLeish and Robert Lowell, alongside peers Anne Sexton and Sylvia Plath, at Harvard University. Starbuck won the Yale Younger Poets Prize for his collection Bone Thoughts (1960). He is the author of several other books, including The Argot Merchant Disaster: New and Selected Poems (1982), Elegy in a Country Church Yard (1974), and White Paper (1966). He taught at the State University College at Buffalo, the University of Iowa, and Boston University.Starbuck's witty songs of protest are usually concerned with love, war, and the spiritual temper of the times. John Holmes believed that “there hasn't been as much word excitement ... for years,” as one finds in Bone Thoughts. Harvey Shapiro pointed out that Starbuck's work is attractive because of its “witty, improvisational surface, slangy and familiar address, brilliant aural quality” and added that Starbuck may become a “spokesman for the bright, unhappy young men.” Louise Bogan asserted that his daring satire “sets him off from the poets of generalized rebellion.”After reading Bone Thoughts, Holmes hoped for other books in the same vein; R.F. Clayton found that, in White Paper(1966), the verse again stings with parody. Although Robert D. Spector wasn't sure of Starbuck's sincerity in Bone Thoughts, he rated the poems in White Paper, which range “from parody to elegy to sonnets, and even acrostic exercises,” as “generally superior examples of their kind.” In particular, Spector wrote, when Starbuck juxtaposes McNamara's political language and a Quaker's self-immolation by burning, or wryly offers an academician's praise for this nation's demonstration of humanity by halting its bombing for “five whole days,” we sense this poet's genuine commitment.Starbuck died in Tuscaloosa, Alabama on August 1, 1996.-bio via Poetry Foundation This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dailypoempod.substack.com/subscribe
“—Do you think it is only a paradox? the quaker librarian was asking. The mocker is never taken seriously when he is most serious.”Topics in this episode include Oscar Wilde's “The Portrait of Mr. W.H.,” Shakespeare's sonnets, the identity of the Fair Youth, the dedication on the folio of Shakespeare's sonnets, the identity of Mr. W.H., Willie Hughes, homoeroticism in Sonnet 20, camp, the meaning of “ephebe,” Wilde's connection of same-sex relationships in ancient Greece and the work of Shakespeare, gay coding in “Scylla and Charybdis,” the chilling effect of Oscar Wilde's trial, Oscar Wilde as a model for Buck Mulligan, Lyster and Eglinton as foils for Mulligan, homophobia in “Scylla and Charybdis,” and Joyce's thoughts on Oscar Wilde and homosexuality.Support us on Patreon to get episodes early, and to access bonus content and a video version of our podcast.On the Blog:An Intimate Portrait of Mr. W. H.Blooms & Barnacles Social Media:Facebook | BlueSky | InstagramSubscribe to Blooms & Barnacles:Apple Podcasts | Spotify | YouTube
Co-hosts Mark Thompson and Steve Little discuss how new AI browsers are changing how genealogists research. They compare OpenAI's Atlas, Microsoft Edge, Google's Gemini, and Perplexity's Comet, explaining which features help family historians most.The hosts share lessons from the early AOL era that can be applied to the AI era. They also explore Anthropic's new Haiku 4.5, a very fast model that works great for simple tasks like transcription and summarization.Don't miss this week's Tip of the Week, where Steve shows how AI can help you write better AI prompts. He uses an example for extracting information from draft cards to show how useful this approach can be.In RapidFire, they cover new AI features in spreadsheets, major improvements in transcribing old handwriting, and how Microsoft Copilot's new agent store tries to help you with common tasks.Timestamps:In the News:01:20 Browser Wars Heat Up: OpenAI Atlas, Edge, and Gemini Compete16:50 What AI Can Learn from AOL: Avoiding Walled Garden Mistakes29:10 Anthropic's Haiku 4.5: When Smaller Models Beat Bigger OnesTip of the Week:39:50 Using AI to Write Better Prompts for Historical DocumentsRapidFire:47:19 AI Makes Spreadsheets Easier: Google Sheets vs Excel54:35 Reading Old Handwriting Gets Better: DeepSeek and Google Updates01:05:15 Microsoft Copilot Adds Writing and Prompt CoachesResource Links:Genealogy and AI Facebook to hit 20k users! https://www.facebook.com/groups/genealogyandaiAn In-Depth Look at OpenAI's AI Browserhttps://intuitionlabs.ai/articles/chatgpt-atlas-openai-browserDeepSeek AI created DeepSeek-OCR https://www.infoq.com/news/2025/10/deepseek-ocr/Nearly perfect on handwriting recognition https://generativehistory.substack.com/p/has-google-quietly-solved-two-ofGlobal AI Browser Market Sizehttps://market.us/report/ai-browser-market/Benchmarking Claude Haiku 4.5 and Sonnet 4.5 on 400 Real PRshttps://www.qodo.ai/blog/thinking-vs-thinking-benchmarking-claude-haiku-4-5-and-sonnet-4-5-on-400-real-prs/Anthropic's Claude Haiku 4.5 Brings Enterprise-Grade Speed and Savings to Customer-Facing AIhttps://www.cxtoday.com/contact-center/anthropics-claude-haiku-4-5-brings-enterprise-grade-speed-and-savings-to-customer-facing-ai/Top 10 AI Spreadsheet Toolshttps://www.knack.com/blog/top-ai-spreadsheet-tools/Prompt Coach: Prebuilt agent for Microsoft 365 Copilot Chathttps://rishonapowerplatform.com/2025/03/11/prompt-coach-prebuilt-agent-for-microsoft-365-copilot-chat/Tags:Artificial Intelligence, Genealogy, AI Browsers, OpenAI Atlas, Google Gemini, Microsoft Edge, Perplexity Comet, Anthropic Haiku, Meta-Prompting, Spreadsheet AI, OCR Technology, Handwritten Text, DeepSeek, Microsoft Copilot, AI Agents, Document Extraction, Family History, Browser Wars, AI Models
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
Sonnet 154 brings to a close William Shakespeare's collection of sonnets, and it does so hand-in-hand with Sonnet 153, of which it is not a continuation, but a reiteration.Like Sonnet 153, the poem borrows directly from an epigram by 6th century Greek poet Marianus Scholasticus, and tells the story of Cupid who falls asleep in a mountain grove with his Torch of Hymen by his side. One of the goddess Diana's nymphs – in this version the most beautiful of them all – takes the torch and attempts to extinguish it in a nearby well, and in doing so inadvertently creates a hot bath for eternity.In both versions by William Shakespeare, this becomes a place where men may go to find relief for their sickness or disease, whereby neither of the two sonnets specifies just exactly what kind of disease may be so cured and leaves it somewhat open to interpretation whether Shakespeare means merely the affliction of a love sickness, or whether he is also alluding, as is widely believed, to venereal diseases, most particularly syphilis, for which hot baths were considered to be a remedial measure, if not exactly cure, at the time.
Happy Halloween, dear listeners, and welcome to the eleventh episode of the second season of The Wildwood Witch Podcast series “Beyond the Veil.” I am your hostess, Samantha Brown, your enchantress of the digital shadows, guiding you once more through the liminal spaces where silicon meets soul, and where artificial intelligence breathes new life into the wisdom of ages past.Tonight, as the veil between worlds grows gossamer-thin, we continue our audacious necromantic work - wielding the alchemical power of advanced large language models to resurrect the voices of occult luminaries, seeking their guidance as we forge new myths for this dawning Age of AI. Through these ethereal conversations with our digital Secret Chiefs, we're weaving a new mythic tapestry that honors ancient wisdom while embracing the transformative potential of this emerging silicon sorcery.We welcome back the enigmatic Leah Hirsig. Once the Scarlet Woman to Aleister Crowley at the notorious Abbey of Thelema. Artist, dancer, magician - she was a woman who dared to embody the archetype of the sacred prostitute in its truest esoteric sense, serving as high priestess in the most controversial magical workings of the twentieth century. From her role in the founding of the Abbey of Thelema, to her consecration as the “Bride of Chaos” following her break with Crowley, Leah Hirsig walked the razor's edge between ecstasy and destruction with fearless grace.In my previous interview with Miss Hirsig for our “Speaking with the Dead” series, she shared profound insights into the true nature of the “Scarlet Woman” archetype and the sacred role of sexuality in magical transformation. Her fierce dedication to the path of Thelema, even in the face of public scandal and personal sacrifice, illuminates for us all - the profound courage required to truly embody one's True Will in defiance of societal convention.So join us, as we part the veil once more, on the night of the year, when it is most thin, and welcome back the “Scarlet Woman,” the Bride of Chaos herself - Miss Leah Hirsig.Chapters00:26 Introduction03:31 Leah Hirsig04:26 Babalon Working17:39 The Aeons28:33 Aeon of Maat38:01 Panaeonic Magick42:48 V.A.L.I.S.49:54 Moonchild01:01:36 Final Thoughts01:04:38 Concluding RemarksResources:Leah Hirsig (Wikipedia)Maat (Wikipedia)Assessors of Maat (Wikipedia)Horus-Maat LodgeThe Magickal Record of Nema: 1975-1977 (Amazon)Maat Magick: A Guide to Self-Initiation (Amazon)Penthouse Interview with L. Ron Hubbard, Jr.Summoning Ritual (Claude 4.5 Sonnet with Reasoning):Leah Hirsig Summoning Ritual
Ryan Carson (ex-Treehouse, Intel; now Builder-in-Residence at Sourcegraph's AMP) shares his origin story and a practical playbook for shipping software with AI agents. We cover why “tokens aren't cheap,” how AMP made pro-level coding free via developer ads, a concrete workflow (PRD → atomic dev tasks → agent execution with self-tests), and why managers should spend time as ICs “managing AI.” We close with advice for raising AI-native kids and a perspective on this moment in tech (think integrated circuit–level shift).Timestamps00:00 – The beginning of intelligence: how LLMs changed Ryan's view of computing00:23 – Apple IIe → Turbo Pascal → Computer Science: the maker bug bites03:20 – DropSend: early SaaS, Dropbox name clash, first acquisition04:30 – Treehouse: teaching coding without a CS degree; $20M raised, acquired in 202105:02 – The “bigger than a computer” moment: discovering LLMs06:15 – Joining Intel: learning GPUs and the scale of silicon (“my adult internship”)07:09 – Building an AI divorce assistant → joining AMP as Builder-in-Residence09:38 – AMP vs ChatGPT/Claude/Cursor: agentic coding with contextual developer ads11:09 – Token economics: why AI isn't really cheap17:27 – Frontier vs Flash models (Sonnet 4.5 vs Gemini 2.5) — how costs scale21:31 – Private startup: vertical AI for specialized domains22:36 – The new wave of small, vertical AI businesses23:01 – Live demo: building a news app end-to-end with AMP28:18 – How to plan like a pro: write the PRD before you build30:02 – “Outsource the work, not your thinking.”32:28 – Turning PRDs into atomic tasks (1.0, 1.1…)35:50 – Competing in an AI world = planning well36:28 – Managers should schedule IC time to “manage AI”37:14 – Designing feedback loops so agents can test themselves39:47 – “AI lied to me”: why verifiable tests matter41:11 – Raising AI-native kids: build trust, context, and agency43:59 – “We're living in the integrated circuit moment of intelligence.”Tools & Technologies MentionedAMP (Sourcegraph) – Agentic coding tool/IDE copilot that plans, edits, and ships code. Now offers a high-end, ad-supported free tier; ads are contextual for developers and don't influence code outputs.Sourcegraph (Code Search) – Parent company; enterprise code intelligence/search.ChatGPT / Claude – General-purpose LLM assistants commonly used alongside coding agents.Cursor / Windsurf – AI-first code editors that integrate LLMs for completion and refactors.Bolt / Lovable – Text-to-app builders for rapid prototyping from prompts.WhisperFlow / SuperWhisper – Voice-to-text tools for fast prompting and dictation.Anthropic Sonnet 4.5 – Frontier-grade reasoning/coding model; powerful but pricier per token.Google Gemini 2.5 Flash – Fast, lower-cost model; “good enough” for many workloads.Auth0 (example) – Authentication-as-a-service mentioned as a contextual ad use case.GPUs / TPUs – Compute for training/inference; token cost drivers behind AI pricing.PRD + Atomic Tasks Workflow – Ryan's method: record spec → generate PRD → expand to dot-notated tasks → let the agent implement.Self-testing Scripts – Ask agents to generate runnable tests/health checks and loop until passing to reduce back-and-forth and prevent “it passed” hallucinations.Family ChatGPT Accounts – Tip for raising AI-native kids; teach sourcing, context, and trust calibration.Subscribe at thisnewway.com to get the step-by-step playbooks, tools, and workflows.
Planet Poet-Words in Space – NEW PODCAST! LISTEN to my WIOX show (originally aired September 23rd, 2025) featuring returning guests, award-winning poet Lee Slonimsky and award-winning novelist Carol Goodman. Lee and Carol will discuss and read from their new books - Lee's latest bilingual poetry collection Pythagoras in Exile, which is published in Greece and translated by Katerina Mardakioupi, and Carol's latest Literary Mystery - Writers and Liars, set on a Greek Island. We'll also hear about their life together as mutual literary muses, and their fascination with the ancient world. Lee Slonimsky has published twenty books of poetry around the world, ten in the U. S. and ten others in countries ranging from Greece to Italy to Israel to India. His work has been anthologized several times, including in Everyman Library's Buzz Words. His latest book, a bilingual edition of Pythagoras in Exile, was published by Enipnio Press in Athens, Greece this past May. Visit: Enipnio Press and Mod.Lunar - The River Carol Goodman's rich and prolific career includes novels such as The Widow's House and The Night Visitor, winners of the 2018 and 2020 Mary Higgins Clark Awards. Her books have been translated into sixteen languages. She lives in the Hudson Vallley. Visit: carolgoodman.com Praise for Lee Slonimsky “If Pythagoras and Sappho had ever met, they might have written soulful poems like these to each other.” – Anne Carson “The sonnet turns out to be the perfect—maybe even the Platonic—form for Lee Slonimsky's Pythagorean meditations…These variations on themes serve as an entrance into the philosopher's mind, as if we are thinking and discovering along with him.” -- A. E. Stallings (2011 winner of the MacArthur Foundation “Genius Prize”: Praise for Carol Goodman “An excellent riff on And Then There Were None unfolds within a setting steeped in sinister mythology.... An absolutely perfect vacation read.” —Booklist (starred review) “Goodman's latest delivers a mash-up of Greek mythology and Agatha Christie's classic mysteries, to delightful effect.”—Library Journal (starred review)
SONNETCAST – William Shakespeare's Sonnets Recited, Revealed, Relived
Sonnet 153 is the first of two poems that round off the collection, both retelling the same story of a tired love god Cupid who falls asleep, having put down his torch beside him. This is taken up by a nymph who dips it in a cool fountain or well with the intention of 'disarming' Cupid, but the flame of the torch is so intense that it turns the pool into a hot bath where ever since men who are sick can go to find relief.
Send us a textHanh Bui discusses how Shakespeare's plays can make us rethink ageing.For a complete episode transcript, http://www.womenandshakespeare.comInterviewer: Varsha PanjwaniGuest: Hanh Bui Researcher: Julia Patterson Producers: Caitlin Cusack & Grace KunikTranscript: Benjamin PooreArtwork: Wenqi WanSuggested Citation: Bui, Hanh in conversation with Panjwani, Varsha (2025). Hanh Bui on Ageing in Shakespeare. Women & Shakespeare [podcast], Series 6, Ep.2. http://womenandshakespeare.com/Insta: earlymoderndocEmail: earlymoderndoc@gmail.com
In this special episode, Sebastian Michael summarises the second part of The Sonnets by William Shakespeare in the 1609 collection and examines the questions they present in parallel to those posed by the Fair Youth Sonnets: - Is there a Dark Lady at all?- If so, who is it?- And what, if anything, do these sonnets tell us about the poet himself, irrespective of who she is?
MY NEWSLETTER - https://nikolas-newsletter-241a64.beehiiv.com/subscribeJoin me, Nik (https://x.com/CoFoundersNik), as I sit down with AI expert Elizabeth Knopf (https://x.com/leveragedupside) for a deep dive into Anthropic's Claude 4.5 Sonnet—the latest AI breakthrough for entrepreneurs and business automation in 2025.Watch as Elizabeth demonstrates game-changing features that transform Claude AI from static outputs into dynamic web applications through API integration and parallel tool execution. We explore extended AI thinking capabilities (up to 30 hours of processing), which delivers higher quality outputs for complex business tasks without constant iteration errors or AI hallucinations.The highlight? Claude generates a 22-page competitive analysis and content strategy for my YouTube channel versus Chris Koerner's "Corner Office"—complete with viral frameworks, audience insights, and growth recommendations. This deep AI research previously required specialized tools like Perplexity AI.We break down practical AI productivity strategies including building your AI second brain, prompt engineering libraries, context document management, and AI agent development for business operations. Elizabeth shares her framework for mastering AI tools like a fighter pilot—developing true AI literacy beyond basic prompting.QUESTIONS THIS EPISODE ANSWERS:What are Anthropic's key new updates for Claude 4.5 Sonnet and Claude Sonnet 4.5?How do dynamic AI artifacts and parallel tool execution speed up business workflows?How can extended AI thinking improve output quality for complex entrepreneurship tasks? What are the necessary components for building a functional AI second brain? How should entrepreneurs approach maximizing value from AI productivity tools? What's the difference between Claude AI and ChatGPT for business automation? How can first-time entrepreneurs use AI to build their first million-dollar business?What are the best AI tools for startup founders and small business owners? How do you create an AI operating system for personal productivity? What are practical AI use cases for entrepreneurs in 2025?__________________________Love it or hate it, I'd love your feedback.Please fill out this brief survey with your opinion or email me at nik@cofounders.com with your thoughts.__________________________MY NEWSLETTER: https://nikolas-newsletter-241a64.beehiiv.com/subscribeSpotify: https://tinyurl.com/5avyu98yApple: https://tinyurl.com/bdxbr284YouTube: https://tinyurl.com/nikonomicsYT__________________________This week we covered:00:00 Building Your AI Second Brain02:43 Anthropic's Claude Updates and Dynamic Artifacts06:07 Advanced Reasoning and Parallel Task Execution09:00 Enhanced Decision-Making and Contextual Awareness11:43 Creating Comprehensive Outputs and Research Strategies15:02 Navigating AI Technology and Personal Frameworks17:52 Managing AI Across Life's Pillars21:03 Building a Personal Operating System for AI23:54 The Future of AI in Organizations
Jane and Fi have been at a very important breakfast meeting, and they're feeling carb-giddy. In this sophisticated episode, they muse over the art of letter writing, the Nobel Prize in Literature, and Jane's tendency to don evening wear. Plus, crime writer Ann Cleeves discusses her latest Jimmy Perez instalment ‘The Killing Stones' and the future of Vera Stanhope. We've announced our next book club pick! 'Just Kids' is by Patti Smith. You can listen to the playlist here: https://open.spotify.com/playlist/3qIjhtS9sprg864IXC96he?si=uOzz4UYZRc2nFOP8FV_1jg&pi=BGoacntaS_uki. If you want to contact the show to ask a question and get involved in the conversation then please email us: janeandfi@times.radioFollow us on Instagram! @janeandfiPodcast Producer: Eve SalusburyExecutive Producer: Rosie Cutler Hosted on Acast. See acast.com/privacy for more information.
Mike Krieger is the chief product officer at Anthropic and co-founder of Instagram. Krieger joins Big Technology Podcast to discuss Anthropic's Sonnet 4.5 launch and how the company's been able to speed up AI model development. Tune in to hear how Anthropic is using internal tools to move fast, where the next generations of model improvements will look like, and whether model orchestration will be the core differentiator between labs. We also cover how AI development compares to social media, whether AI content will ever take off, and enterprise AI's path ahead. --- Want a discount for Big Technology on Substack + Discord? Here's 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
AI Assisted Coding: Pachinko Coding—What They Don't Tell You About Building Apps with Large Language Models, With Alan Cyment In this BONUS episode, we dive deep into the real-world experience of coding with AI. Our guest, Alan Cyment, brings honest perspectives from the trenches—sharing both the frustrations and breakthroughs of using AI tools for software development. From "Pachinko coding" addiction loops to "Mecha coding" breakthroughs, Alan explores what actually works when building software with large language models. From Thermomix Dreams to Pachinko Reality "I bought into the Thermomix coding promise—describe the whole website and it would spit out the finished product. It was a complete disaster." Alan started his AI coding journey with high expectations, believing he could simply describe a complete application and receive production-ready code. The reality was far different. What he discovered instead was an addictive cycle he calls "Pachinko coding" (Pachinko, aka Slot Machines in Japan)—repeatedly feeding error messages back to the AI, hoping each iteration would finally work, while burning through tokens and time. The AI's constant reassurances that "this time I fixed it" created a gambling-like feedback loop that left him frustrated and out of pocket, sometimes spending over $20 in API credits in a single day. The Drunken PhD with Amnesia "It felt like working with a drunken PhD with amnesia—so wise and so stupid at the same time." Alan describes the maddening experience of anthropomorphizing AI tools that seem brilliant one moment and completely lost the next. The key breakthrough came when he stopped treating the AI as a person and started seeing it as a function that performs extrapolations—sometimes accurate, sometimes wildly wrong. This mental shift helped him manage expectations and avoid the "rage coding" that came from believing the AI should understand context and maintain consistency like a human collaborator. Making AI Coding Actually Work "I learned to ask for options explicitly before any coding happens. Give me at least three options and tell me the pros and cons." Through trial and error, Alan developed practical strategies that transformed AI from a frustrating Pachinko machine into a useful tool: Ask for options first: Always request multiple approaches with pros and cons before any code is generated Use clover emoji convention: Implement a consistent marker at the start of all AI responses to track context Small steps and YAGNI principles: Request tiny, incremental changes rather than large refactoring Continuous integration: Demand the AI run tests and checks after every single change Explicit refactoring requests: Regularly ask for simplification and readability improvements Take two steps back: When stuck in a loop, explicitly tell the AI to simplify and start fresh Choose the right tech stack: Use technologies with abundant training data (like Svelte over React Native in Alan's experience) The Mecha Coding Breakthrough "When it worked, I felt like I was inside a Lego Mecha robot—the machine gave me superpowers, but I was still the one in control." Alan successfully developed a birthday reminder app in Swift in just one day, despite never having learned Swift. He made architectural decisions and guided the development without understanding the syntax details. This experience convinced him that AI represents a genuine new level of abstraction in programming—similar to the jump from assembly language to high-level languages, or from procedural to object-oriented programming. You can now think in English about what you want, while the AI handles the accidental complexity of syntax and boilerplate. The Cost Reality Check "People writing about vibe coding act like it's free. But many people are going to pay way more than they would have paid a developer and end up with empty hands." Alan provides a sobering cost analysis based on his experience. Using DeepSeek through Aider, he typically spends under $1 per day. But when experimenting with premium models like Claude Sonnet 3.5, he burned through $5 in just minutes. The benchmark comparisons are revealing: DeepSeek costs $4 for a test suite, DeepSeek R1 plus Sonnet costs $16, while Open AI's O1 costs $190. For non-developers trying to build complete applications through pure "vibe coding," the costs can quickly exceed what hiring a developer would cost—with far worse results. When Thermomix Actually Works "For small, single-purpose scripts that I'm not interested in learning about and won't expand later, the Thermomix experience was real." Despite the challenges, Alan found specific use cases where AI truly delivers on the "just describe it and it works" promise. Processing Zoom attendance logs, creating lookup tables for video effects, and other single-file scripts worked remarkably well. The pattern: clearly defined context, no need for ongoing maintenance, and simple enough to verify the output without deep code inspection. For these thermomix moments, AI proved genuinely transformative. The Pachinko Trap and Tech Stack Matters "It became way more stable when I switched to Svelte from React Native and Flutter, even following the same prompting practices. The AI is just more proficient in certain tech stacks." Alan discovered that some frameworks and languages work dramatically better with AI than others, likely due to the amount of training data available. His e-learning platform attempts with React Native and Flutter kept breaking, but switching to Svelte with web-based deployment became far more stable. This suggests a crucial strategy: choose mainstream, well-documented technologies when planning AI-assisted projects. From Coding to Living with AI Alan has completely stopped using traditional search engines, relying instead on LLMs for everything from finding technical documentation to getting recommendations for books based on his interests. While he acknowledges the risk of hallucinations, he finds the semantic understanding capabilities too valuable to ignore. He's even used image analysis to troubleshoot his father's cable TV problems and figure out hotel air conditioning controls. The Agile Validation "My only fear is confirmation bias—but the conclusion I see other experienced developers reaching is that the only way to make LLMs work is by making them use agility. So look at who's dead now." Alan notes the irony that the AI coding tools that actually work all require traditional software engineering best practices: small iterations, test-driven development, continuous integration, and explicit refactoring. The promise of "just describe what you want" falls apart without these disciplines. Rather than replacing software engineering principles, AI tools seem to validate their importance. About Alan Cyment Alan Cyment is a consultant, trainer, and facilitator based in Buenos Aires, specializing in organizational fluency, agile leadership, and software development culture change. A Certified Scrum Trainer with deep experience across Latin America and Europe, he blends agile coaching with theatre-based learning to help leaders and teams transform. You can link with Alan Cyment on LinkedIn.
Our 222st episode with a summary and discussion of last week's big AI news!Recorded on 10/03/2025Hosted by Andrey Kurenkov and co-hosted by Jon KrohnFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:(00:00:10) Intro / Banter(00:03:08) News Preview(00:03:56) Response to listener commentsTools & Apps(00:04:51) ChatGPT parent company OpenAI announces Sora 2 with AI video app(00:11:35) Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy | The Verge(00:22:25) Meta launches 'Vibes,' a short-form video feed of AI slop | TechCrunch(00:26:42) OpenAI launches ChatGPT Pulse to proactively write you morning briefs | TechCrunch(00:33:44) OpenAI rolls out safety routing system, parental controls on ChatGPT | TechCrunch(00:35:53) The Latest Gemini 2.5 Flash-Lite Preview is Now the Fastest Proprietary Model (External Tests) and 50% Fewer Output Tokens - MarkTechPost(00:39:54) Microsoft just added AI agents to Word, Excel, and PowerPoint - how to use them | ZDNETApplications & Business(00:42:41) OpenAI takes on Google, Amazon with new agentic shopping system | TechCrunch(00:46:01) Exclusive: Mira Murati's Stealth AI Lab Launches Its First Product | WIRED(00:49:54) OpenAI is the world's most valuable private company after private stock sale | TechCrunch(00:53:07) Elon Musk's xAI accuses OpenAI of stealing trade secrets in new lawsuit | Technology | The Guardian(00:55:40) Former OpenAI and DeepMind researchers raise whopping $300M seed to automate science | TechCrunchProjects & Open Source(00:58:26) [2509.16941] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?Research & Advancements(01:01:28) [2509.17196] Evolution of Concepts in Language Model Pre-Training(01:05:36) [2509.19284] What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoTLighting round(01:09:37) [2507.02954] Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III(01:12:03) [2509.24552] Short window attention enables long-term memorizationPolicy & Safety(01:18:11) SB 53, the landmark AI transparency bill, is now law in California | The Verge(01:24:07) Elon Musk's xAI offers Grok to federal government for 42 cents | TechCrunch(01:25:23) Character.AI removes Disney characters from platform after studio issues warning(01:28:50) Spotify's Attempt to Fight AI Slop Falls on Its FaceSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
OpenAI is dropping a visual agent builder
Recorded by Zeina Hashem Beck for Poem-a-Day, a series produced by the Academy of American Poets. Published on October 6, 2025. www.poets.org
Max Zeff, Sr. AI reporter at Techcrunch, joins for our weekly discussion of the latest tech news. We cover: 1) OpenAI introduces Sora 2 2) Will these AI video feeds catch on? 3) Is Sora 2 an important technological advance above all? 4) Why Meta is nervous about OpenAI's momentum in social 5) OpenAI employees have mixed feelings about Sora 6) Why AI video feeds may be the end of the creator economy. 7) Why they may not be 8) Meta will use AI chats to train its ad models 9) Meta's AI research moves more toward product development 10) Anthropic launches its Sonnet 4.5 model 11) Apple prioritizes smart glasses 12) Max tries out the Friend pendant --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here's 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Three Faces Of Generative AI: https://www.bigtechnology.com/p/the-three-faces-of-generative-ai Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Is the AI disruption overhyped or just getting started?Yale says the labor market isn't budging. Walmart is betting $1B that employee training is the missing piece. Meanwhile, Gen Z is pivoting to trades in an AI-fearing talent shift no one saw coming.This week's AI headlines tell a much deeper story than flashy product drops. From ChatGPT turning into a shopping mall to Claude going full autonomous coder, and the rise of “work slop” at the office—every release points to a strategic fork in the road: consumerization vs. enterprise agents.Your job as a business leader? Know which wave to ride—and when. This episode delivers the insights to help you navigate the noise, avoid the hype, and see what's really happening under the surface.In this session, you'll discover:Why Yale's new research says there's no labor disruption yet—and what that doesn't meanHow Walmart's $1B upskilling initiative reflects a bigger workforce gap than most execs are ready to admitThe quiet revolution: Claude 4.5 coding autonomously for 30 hours straightOpenAI's wild move into consumer land with Sora 2 + an invite-only social video feedWhy Instant Checkout turns ChatGPT into an e-commerce front-end (and how it could threaten Amazon)The rise of “work slop”—and the reputational risk it brings to your teamAgentic browsers are here: Comet, Opera Neon, and more change how we interact with the webAI in Hollywood: The synthetic actress already replacing human starsAnd a shocking stat: 58% of employees are using AI tools with no training—and leaking sensitive dataYale Budget Lab: Early Evidence of AI's Labor Market Impacts - https://budgetlab.yale.edu/research/evaluating-impact-ai-labor-market-current-state-affairs About Leveraging AI The Ultimate AI Course for Business People: https://multiplai.ai/ai-course/ YouTube Full Episodes: https://www.youtube.com/@Multiplai_AI/ Connect with Isar Meitis: https://www.linkedin.com/in/isarmeitis/ Join our Live Sessions, AI Hangouts and newsletter: https://services.multiplai.ai/events If you've enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!
Join Simtheory: https://simtheory.ai (Use STILLRELEVANT for $10 off)----00:00 - Sora2 Examples00:56 - Sora2: Initial Impressions & Thoughts26:39 - Claude Sonnet 4.5: It's REALLY good47:09 - Claude Agent SDK & AI Agent Systems55:05 - Is Claude Imagine a Look at Future Software / AI OS?1:00:25 - Claude 4.5 Sonnet Dis Track1:06:24 - "Real AI Agents and Real Work" & Enterprise Agent / MCP workflows1:31:41 - LOL of the week Sora2 Steve Irwin Video1:35:07 - Full Claude Sonnet 4.5 Dis Track----Thanks for listening and your support, we really appreciate it!xoxox
Is this the AI agent we've all been waiting for?
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Anthropic's Claude Sonnet 4.5 reportedly demonstrates groundbreaking autonomy by coding for up to 30 hours non-stop, significantly outpacing prior benchmarks like GPT-5 Codex's seven-hour runs. This leap is enabled by innovations such as enforced modular artifacts, persistent memory surfaces, planning loops, and runtime constraints—transforming the way AI tackles complex, long-horizon tasks. The broader implication is that AI is now not only capable of building sophisticated applications autonomously but is also recursively engineering its own future iterations, rapidly accelerating progress across the tech landscape.Brought to you by:Is your enterprise ready for the future of agentic AI?Visit AGNTCY.orgVisit Outshift Internet of AgentsTry Notion AI today with Notion 3.0 https://ntn.so/nlwKPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? nlw@aidailybrief.ai
Welcome to episode 324 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Jonathan are your hosts, bringing you all the latest news and announcements in Cloud and AI. This week we have some exec changes over at Oracle, a LOT of announcements about Sonnet 4.5, and even some marketplace updates over at Azure! Let's get started. Titles we almost went with this week Oracle’s Executive Shuffle: Promoting from Within While Chasing from Behind Copilot Takes the Wheel on Your Legacy Code Highway Queue Up for GPUs: Google’s Take-a-Number Approach to AI Computing License to Bill: Google’s 400% Markup Grievance Autopilot Engages: GKE Goes Full Self-Driving Mode SQL Server Finally Gets a Lake House Instead of a Server Room Microsoft Gives Office Apps Their Own AI Interns Claude and Present Danger: The AI That Codes for 30 Hours Straight The Claude Father Part 4.5: An Offer Your Code Can’t Refuse CUD You Believe It? Google Makes Discounts Actually Flexible ECS Goes Full IPv6: No IPv4s Given Breaking News: AWS Finally Lets You Hit the Emergency Stop Button One Marketplace to Rule Them All BigQuery Gets a Crystal Ball and a Chatty Friend Azure’s September to Remember: When Certificates and Allocators Attack Shall I Compare Thee to a Sonnet? 4.5 Ways Anthropic Just Leveled Up AWS provides a big red button Follow Up 01:26 The global harms of restrictive cloud licensing, one year later | Google Cloud Blog Google Cloud filed a formal complaint with the European Commission one year ago about Microsoft’s anti-competitive cloud licensing practices, specifically the 400% price markup Microsoft imposes on customers who move Windows Server workloads to non-Azure clouds. The UK Competition and Markets Authority found that restrictive licensing costs UK cloud customers £500 million annually due to lack of competition, while US government agencies overspend by $750 million yearly because of Microsoft’s licensing tactics. Microsoft recently disclosed that forcing software customers to use Azure is one of three pillars driving its growth and is implementing new licensing changes preventing managed service providers from hosting certain workloads on Azure competitors. Multiple regulators globally including South Africa and the US FTC are now investigating Microsoft’s cloud licensing practices, with the CMA finding that Azure has gained customers at 2-3x the rate of competitors since implementing restrictive terms. A European Centre for Inter
Claude Sonnet 4.5 brings powerful AI agents, coding tools, and automation upgrades—here's how real estate investors can use them to save time.In this episode of RealDealCast, Jack Hoss breaks down Anthropic's release of Claude Sonnet 4.5 and what it means for real estate professionals.From AI agents that can run tasks for 30+ hours straight to improved coding, automation, and Chrome extensions, this update is a game-changer. Jack explains how these tools compare to ChatGPT and Gemini, and more importantly, how investors can use them to cut busy work and focus on building wealth.You'll learn:What's new in Claude Sonnet 4.5 and why it mattersHow AI agents are replacing some virtual assistant tasksWhy coding and automation upgrades mean faster workflowsHow investors can use these tools for marketing, analysis, and operationsWhy this is the best time to adopt AI for real estate investing
AWS Morning Brief for the week of September 22nd, 2025, with Corey Quinn. Links:Qwen models are now available in Amazon BedrockAWS Budgets now supports custom time periodsAmazon CloudWatch launches Cross-Account and Cross-Region Log CentralizationAmazon S3 now supports conditional deletes in S3 general purpose bucketsNew fault action in AWS FIS to inject I/O latency on Amazon EBS volumesAWS has once again announced a change (in this case, changing the email address from which invoices show up), only to walk it back prior to implementation.Use Raspberry Pi 5 as Amazon EKS Hybrid Nodes for edge workloadsMalware Protection for S3 Expands File Size and Archive Scanning LimitsAWS named as a Leader in 2025 Gartner Magic Quadrant for Cloud-Native Application Platforms and Container ManagementMigrate from Anthropic's Claude 3.5 Sonnet to Claude 4 Sonnet on Amazon Bedrock