POPULARITY
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
◯Meta、デスクトップ版「Messenger」を終了。ブラウザ版への移行を強制 https://gori.me/facebook/facebook-news/162776 ◯ついにAndroid←→iPhoneでAirDropが可能に!実際使えるのか徹底検証した https://pc.watch.impress.co.jp/docs/topic/feature/2070612.html ◯Apple Music、ChatGPTと連携可能に https://www.musicman.co.jp/business/706494 ◯36GB積んだM3 Maxなのに「足りない」。“4倍高騰”の波を見て、M4 Max 128GBを本気で検討し始めた話 https://gori.me/macbookpro/162805 ◯ドコモ、コミケ会場周辺の5G強化 初の「3セクタMMU車両」も投入へ https://www.itmedia.co.jp/news/articles/2512/23/news115.html ◯なぜ叩かれなかった? AIイラストの「嫌悪感」、コロプラ新作ゲーの“かわし方”がうまかった https://www.itmedia.co.jp/business/articles/2512/17/news023.html ◯NTT東日本が“最大25Gbps”の極太回線「フレッツ 光25G」を発表。月額2万7500円。東京都中央区の一部エリアで2026年3月31日より提供開始。最大50Gbpsのサービス提供も検討中、夢はひろがる https://news.denfaminicogamer.jp/news/2512232a ◯マイナポータル強化へ、1月1日に大規模改修 31日22時から利用停止 https://www.watch.impress.co.jp/docs/news/2075014.html ◯Let's Encryptが証明書の有効期間を45日間に短縮すると発表 https://gigazine.net/news/20251203-lets-encrypt-decreasing-certificate-lifetimes-45-days/
This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsHola AI aficionados, it's yet another ThursdAI, and yet another week FULL of AI news, spanning Open Source LLMs, Multimodal video and audio creation and more! Shiptember as they call it does seem to deliver, and it was hard even for me to follow up on all the news, not to mention we had like 3-4 breaking news during the show today! This week was yet another Qwen-mas, with Alibaba absolutely dominating across open source, but also NVIDIA promising to invest up to $100 Billion into OpenAI. So let's dive right in! As a reminder, all the show notes are posted at the end of the article for your convenience. ThursdAI - Because weeks are getting denser, but we're still here, weekly, sending you the top AI content! Don't miss outTable of Contents* Open Source AI* Qwen3-VL Announcement (Qwen3-VL-235B-A22B-Thinking):* Qwen3-Omni-30B-A3B: end-to-end SOTA omni-modal AI unifying text, image, audio, and video* DeepSeek V3.1 Terminus: a surgical bugfix that matters for agents* Evals & Benchmarks: agents, deception, and code at scale* Big Companies, Bigger Bets!* OpenAI: ChatGPT Pulse: Proactive AI news cards for your day* XAI Grok 4 fast - 2M context, 40% fewer thinking tokens, shockingly cheap* Alibaba Qwen-Max and plans for scaling* This Week's Buzz: W&B Fully Connected is coming to London and Tokyo & Another hackathon in SF* Vision & Video: Wan 2.2 Animate, Kling 2.5, and Wan 4.5 preview* Moondream-3 Preview - Interview with co-founders Via & Jay* Wan open sourced Wan 2.2 Animate (aka “Wan Animate”): motion transfer and lip sync* Kling 2.5 Turbo: cinematic motion, cheaper and with audio* Wan 4.5 preview: native multimodality, 1080p 10s, and lip-synced speech* Voice & Audio* ThursdAI - Sep 25, 2025 - TL;DR & Show notesOpen Source AIThis was a Qwen-and-friends week. I joked on stream that I should just count how many times “Alibaba” appears in our show notes. It's a lot.Qwen3-VL Announcement (Qwen3-VL-235B-A22B-Thinking): (X, HF, Blog, Demo)Qwen 3 launched earlier as a text-only family; the vision-enabled variant just arrived, and it's not timid. The “thinking” version is effectively a reasoner with eyes, built on a 235B-parameter backbone with around 22B active (their mixture-of-experts trick). What jumped out is the breadth of evaluation coverage: MMU, video understanding (Video-MME, LVBench), 2D/3D grounding, doc VQA, chart/table reasoning—pages of it. They're showing wins against models like Gemini 2.5 Pro and GPT‑5 on some of those reports, and doc VQA is flirting with “nearly solved” territory in their numbers.Two caveats. First, whenever scores get that high on imperfect benchmarks, you should expect healthy skepticism; known label issues can inflate numbers. Second, the model is big. Incredible for server-side grounding and long-form reasoning with vision (they're talking about scaling context to 1M tokens for two-hour video and long PDFs), but not something you throw on a phone.Still, if your workload smells like “reasoning + grounding + long context,” Qwen 3 VL looks like one of the strongest open-weight choices right now.Qwen3-Omni-30B-A3B: end-to-end SOTA omni-modal AI unifying text, image, audio, and video (HF, GitHub, Qwen Chat, Demo, API)Omni is their end-to-end multimodal chat model that unites text, image, and audio—and crucially, it streams audio responses in real time while thinking separately in the background. Architecturally, it's a 30B MoE with around 3B active parameters at inference, which is the secret to why it feels snappy on consumer GPUs.In practice, that means you can talk to Omni, have it see what you see, and get sub-250 ms replies in nine speaker languages while it quietly plans. It claims to understand 119 languages. When I pushed it in multilingual conversational settings it still code-switched unexpectedly (Chinese suddenly appeared mid-flow), and it occasionally suffered the classic “stuck in thought” behavior we've been seeing in agentic voice modes across labs. But the responsiveness is real, and the footprint is exciting for local speech streaming scenarios. I wouldn't replace a top-tier text reasoner with this for hard problems, yet being able to keep speech native is a real UX upgrade.Qwen Image Edit, Qwen TTS Flash, and Qwen‑GuardQwen's image stack got a handy upgrade with multi-image reference editing for more consistent edits across shots—useful for brand assets and style-tight workflows. TTS Flash (API-only for now) is their fast speech synth line, and Q‑Guard is a new safety/moderation model from the same team. It's notable because Qwen hasn't really played in the moderation-model space before; historically Meta's Llama Guard led that conversation.DeepSeek V3.1 Terminus: a surgical bugfix that matters for agents (X, HF)DeepSeek whale resurfaced to push a small 0.1 update to V3.1 that reads like a “quality and stability” release—but those matter if you're building on top. It fixes a code-switching bug (the “sudden Chinese” syndrome you'll also see in some Qwen variants), improves tool-use and browser execution, and—importantly—makes agentic flows less likely to overthink and stall. On the numbers, Humanities Last Exam jumped from 15 to 21.7, while LiveCodeBench dipped slightly. That's the story here: they traded a few raw points on coding for more stable, less dithery behavior in end-to-end tasks. If you've invested in their tool harness, this may be a net win.Liquid Nanos: small models that extract like they're big (X, HF)Liquid Foundation Models released “Liquid Nanos,” a set of open models from roughly 350M to 2.6B parameters, including “extract” variants that pull structure (JSON/XML/YAML) from messy documents. The pitch is cost-efficiency with surprisingly competitive performance on information extraction tasks versus models 10× their size. If you're doing at-scale doc ingestion on CPUs or small GPUs, these look worth a try.Tiny IBM OCR model that blew up the charts (HF)We also saw a tiny IBM model (about 250M parameters) for image-to-text document parsing trending on Hugging Face. Run in 8-bit, it squeezes into roughly 250 MB, which means Raspberry Pi and “toaster” deployments suddenly get decent OCR/transcription against scanned docs. It's the kind of tiny-but-useful release that tends to quietly power entire products.Meta's 32B Code World Model (CWM) released for agentic code reasoning (X, HF)Nisten got really excited about this one, and once he explained it, I understood why. Meta released a 32B code world model that doesn't just generate code - it understands code the way a compiler does. It's thinking about state, types, and the actual execution context of your entire codebase.This isn't just another coding model - it's a fundamentally different approach that could change how all future coding models are built. Instead of treating code as fancy text completion, it's actually modeling the program from the ground up. If this works out, expect everyone to copy this approach.Quick note, this one was released with a research license only! Evals & Benchmarks: agents, deception, and code at scaleA big theme this week was “move beyond single-turn Q&A and test how these things behave in the wild.” with a bunch of new evals released. I wanted to cover them all in a separate segment. OpenAI's GDP Eval: “economically valuable tasks” as a bar (X, Blog)OpenAI introduced GDP Eval to measure model performance against real-world, economically valuable work. The design is closer to how I think about “AGI as useful work”: 44 occupations across nine sectors, with tasks judged against what an industry professional would produce.Two details stood out. First, OpenAI's own models didn't top the chart in their published screenshot—Anthropic's Claude Opus 4.1 led with roughly a 47.6% win rate against human professionals, while GPT‑5-high clocked in around 38%. Releasing a benchmark where you're not on top earns respect. Second, the tasks are legit. One example was a manufacturing engineer flow where the output required an overall design with an exploded view of components—the kind of deliverable a human would actually make.What I like here isn't the precise percent; it's the direction. If we anchor progress to tasks an economy cares about, we move past “trivia with citations” and toward “did this thing actually help do the work?”GAIA 2 (Meta Super Intelligence Labs + Hugging Face): agents that execute (X, HF)MSL and HF refreshed GAIA, the agent benchmark, with a thousand new human-authored scenarios that test execution, search, ambiguity handling, temporal reasoning, and adaptability—plus a smartphone-like execution environment. GPT‑5-high led across execution and search; Kimi's K2 was tops among open-weight entries. I like that GAIA 2 bakes in time and budget constraints and forces agents to chain steps, not just spew plans. We need more of these.Scale AI's “SWE-Bench Pro” for coding in the large (HF)Scale dropped a stronger coding benchmark focused on multi-file edits, 100+ line changes, and large dependency graphs. On the public set, GPT‑5 (not Codex) and Claude Opus 4.1 took the top two slots; on a commercial set, Opus edged ahead. The broader takeaway: the action has clearly moved to test-time compute, persistent memory, and program-synthesis outer loops to get through larger codebases with fewer invalid edits. This aligns with what we're seeing across ARC‑AGI and SWE‑bench Verified.The “Among Us” deception test (X)One more that's fun but not frivolous: a group benchmarked models on the social deception game Among Us. OpenAI's latest systems reportedly did the best job both lying convincingly and detecting others' lies. This line of work matters because social inference and adversarial reasoning show up in real agent deployments—security, procurement, negotiations, even internal assistant safety.Big Companies, Bigger Bets!Nvidia's $100B pledge to OpenAI for 10GW of computeLet's say that number again: one hundred billion dollars. Nvidia announced plans to invest up to $100B into OpenAI's infrastructure build-out, targeting roughly 10 gigawatts of compute and power. Jensen called it the biggest infrastructure project in history. Pair that with OpenAI's Stargate-related announcements—five new datacenters with Oracle and SoftBank and a flagship site in Abilene, Texas—and you get to wild territory fast.Internal notes circulating say OpenAI started the year around 230MW and could exit 2025 north of 2GW operational, while aiming at 20GW in the near term and a staggering 250GW by 2033. Even if those numbers shift, the directional picture is clear: the GPU supply and power curves are going vertical.Two reactions. First, yes, the “infinite money loop” memes wrote themselves—OpenAI spends on Nvidia GPUs, Nvidia invests in OpenAI, the market adds another $100B to Nvidia's cap for good measure. But second, the underlying demand is real. If we need 1–8 GPUs per “full-time agent” and there are 3+ billion working adults, we are orders of magnitude away from compute saturation. The power story is the real constraint—and that's now being tackled in parallel.OpenAI: ChatGPT Pulse: Proactive AI news cards for your day (X, OpenAI Blog)In a #BreakingNews segment, we got an update from OpenAI, that currently works only for Pro users but will come to everyone soon. Proactive AI, that learns from your chats, email and calendar and will show you a new “feed” of interesting things every morning based on your likes and feedback! Pulse marks OpenAI's first step toward an AI assistant that brings the right info before you ask, tuning itself with every thumbs-up, topic request, or app connection. I've tuned mine for today, we'll see what tomorrow brings! P.S - Huxe is a free app from the creators of NotebookLM (Ryza was on our podcast!) that does a similar thing, so if you don't have pro, check out Huxe, they just launched! XAI Grok 4 fast - 2M context, 40% fewer thinking tokens, shockingly cheap (X, Blog)xAI launched Grok‑4 Fast, and the name fits. Think “top-left” on the speed-to-cost chart: up to 2 million tokens of context, a reported 40% reduction in reasoning token usage, and a price tag that's roughly 1% of some frontier models on common workloads. On LiveCodeBench, Grok‑4 Fast even beat Grok‑4 itself. It's not the most capable brain on earth, but as a high-throughput assistant that can fan out web searches and stitch answers in something close to real time, it's compelling.Alibaba Qwen-Max and plans for scaling (X, Blog, API)Back in the Alibaba camp, they also released their flagship API model, Qwen 3 Max, and showed off their future roadmap. Qwen-max is over 1T parameters, MoE that gets 69.6 on Swe-bench verified and outperforms GPT-5 on LMArena! And their plan is simple: scale. They're planning to go from 1 million to 100 million token context windows and scale their models into the terabytes of parameters. It culminated in a hilarious moment on the show where we all put on sunglasses to salute a slide from their presentation that literally said, “Scaling is all you need.” AGI is coming, and it looks like Alibaba is one of the labs determined to scale their way there. Their release schedule lately (as documented by Swyx from Latent.space) is insane. This Week's Buzz: W&B Fully Connected is coming to London and Tokyo & Another hackathon in SFWeights & Biases (now part of the CoreWeave family) is bringing Fully Connected to London on Nov 4–5, with another event in Tokyo on Oct 31. If you're in Europe or Japan and want two days of dense talks and hands-on conversations with teams actually shipping agents, evals, and production ML, come hang out. Readers got a code on stream; if you need help getting a seat, ping me directly.Links: fullyconnected.comWe are also opening up registrations to our second WeaveHacks hackathon in SF, October 11-12, yours trully will be there, come hack with us on Self Improving agents! Register HEREVision & Video: Wan 2.2 Animate, Kling 2.5, and Wan 4.5 previewThis is the most exciting space in AI week-to-week for me right now. The progress is visible. Literally.Moondream-3 Preview - Interview with co-founders Via & JayWhile I've already reported on Moondream-3 in the last weeks newsletter, this week we got the pleasure of hosting Vik Korrapati and Jay Allen the co-founders of MoonDream to tell us all about it. Tune in for that conversation on the pod starting at 00:33:00Wan open sourced Wan 2.2 Animate (aka “Wan Animate”): motion transfer and lip sync Tongyi's Wan team shipped an open-source release that the community quickly dubbed “Wanimate.” It's a character-swap/motion transfer system: provide a single image for a character and a reference video (your own motion), and it maps your movement onto the character with surprisingly strong hair/cloth dynamics and lip sync. If you've used runway's Act One, you'll recognize the vibe—except this is open, and the fidelity is rising fast.The practical uses are broader than “make me a deepfake.” Think onboarding presenters with perfect backgrounds, branded avatars that reliably say what you need, or precise action blocking without guessing at how an AI will move your subject. You act it; it follows.Kling 2.5 Turbo: cinematic motion, cheaper and with audioKling quietly rolled out a 2.5 Turbo tier that's 30% cheaper and finally brings audio into the loop for more complete clips. Prompts adhere better, physics look more coherent (acrobatics stop breaking bones across frames), and the cinematic look has moved from “YouTube short” to “film-school final.” They seeded access to creators and re-shared the strongest results; the consistency is the headline. (Source X: @StevieMac03)I've chatted with my kiddos today over facetime, and they were building minecraft creepers. I took a screenshot, sent to Nano Banana to make their creepers into actual minecraft ones, and then with Kling, Animated the explosions for them. They LOVED it! Animations were clear, while VEO refused for me to even upload their images, Kling didn't care hahaWan 4.5 preview: native multimodality, 1080p 10s, and lip-synced speechWan also teased a 4.5 preview that unifies understanding and generation across text, image, video, and audio. The eye-catching bit: generate a 1080p, 10-second clip with synced speech from just a script. Or supply your own audio and have it lip-sync the shot. I ran my usual “interview a polar bear dressed like me” test and got one of the better results I've seen from any model. We're not at “dialogue scene” quality, but “talking character shot” is getting… good. The generation of audio (not only text + lipsync) is one of the best ones besides VEO, it's really great to see how strongly this improves, sad that this wasn't open sourced! And apparently it supports “draw text to animate” (Source: X) Voice & AudioSuno V5: we've entered the “I can't tell anymore” eraSuno calls V5 a redefinition of audio quality. I'll be honest, I'm at the edge of my subjective hearing on this. I've caught myself listening to Suno streams instead of Spotify and forgetting anything is synthetic. The vocals feel more human, the mixes cleaner, and the remastering path (including upgrading V4 tracks) is useful. The last 10% to “you fooled a producer” is going to be long, but the distance between V4 and V5 already makes me feel like I should re-cut our ThursdAI opener.MiMI Audio: a small omni-chat demo that hints at the floorWe tried a MiMI Audio demo live—a 7B-ish model with speech in/out. It was responsive but stumbled on singing and natural prosody. I'm leaving it in here because it's a good reminder that the open floor for “real-time voice” is rising quickly even for small models. And the moment you pipe a stronger text brain behind a capable, native speech front-end, the UX leap is immediate.Ok, another DENSE week that finishes up Shiptember, tons of open source, Qwen (Tongyi) shines, and video is getting so so good. This is all converging folks, and honestly, I'm just happy to be along for the ride! This week was also Rosh Hashanah, which is the Jewish new year, and I've shared on the pod that I've found my X post from 3 years ago, using the state of the art AI models of the time. WHAT A DIFFERENCE 3 years make, just take a look, I had to scale down the 4K one from this year just to fit into the pic! Shana Tova to everyone who's reading this, and we'll see you next week
Esther Lisk-Carew is engagement coordinator for a programme called AHEAD (Arts and Humanities Engagement and Dialogue) at Manchester Metropolitan University. She is also a freelance consultant, trustee of Manchester's Portico Library and founder of the Black and Global Majority Cultural Creative Network. Sarah and Esther talk about How she combines a love of storytelling and the arts with a talent for managing events and operations Why engagement is often baked into the methods and outputs of arts and humanities Building a career in the cultural sector (despite a detour into French law) Why academia needs more diverse voices, and high-profile cultural icons Find out more Read the show notes and transcript on the podcast website Find out about everything Esther is working on via her Linktree Find AHEAD at MMU on the web, Linktree, LinkedIn and Instagram About Research Adjacent Where are you listening from? Share a pic and tag @ResearchAdjacent on LinkedIn, Instagram or BlueSky Fill out the research-adjacent careers quiz Sign up to the Research Adjacent newsletter Email a comment, question or suggestion Leave Sarah a voice message
*Note: this is the Free Content version of my interview with Dr. Matteo Polato. To access the full interview, please consider becoming a Patreon member; alternately, this episode is available for a one-time purchase under the "Shop" tab. www.patreon.com/RejectedReligionMy guest this month is Dr. Matteo Polato.Matteo is a researcher, sound artist and videogame developer. He works as Senior Research Assistant at the School of Digital Arts of Manchester Metropolitan University, where he has completed his PhD on the roles of sound, vibration and resonance-based processes in contemporary occulture and paranormal practices. His artistic practice spans from electroacoustic composition to free improvisation and psychedelic rock, in solo and with bands such as Mamuthones and Cacotopos. As Yami Kurae – with Jacopo Bortolussi – he develops experimental games inspired by psychogeography and occultural practices. He is co-founder of D∀RK – Dark Arts Research Kollective at MMU, and co- artistic director of the association for experimental music research Centro d'Arte dell'Università di Padova.Matteo's recent article in Revenant Journal dives deep into the sonic and atmospheric dimensions of the paranormal documentary series Hellier. From Reddit threads to academic conferences, Matteo's journey into Fortean soundscapes is as unexpected as it is fascinating.Matteo recounts his initial encounter with Hellier and how its unique approach to paranormal investigation inspired him to analyze it academically. Unlike typical ghost-hunting shows, Hellier emphasizes experiential and atmospheric elements, which resonated with his interests in sound studies and Fortean phenomena.Matteo's article shifts the lens from why paranormal entities emerge to how sensory experiences (especially sound) create a sense of supernatural agency. He uses Hellier as a case study to explore this dynamic, drawing from sound studies and concepts like: the eerie, affective atmospheres, & agency and attunement.Matteo argues that sonic interactions and “listening ecologies” are central to how Hellier portrays paranormal phenomena. He explains how sound is not just a medium but a method of engaging with unseen forces. Examples from the series include:Ritualistic listening sessionsUse of the Estes Method vs. traditional EVPAmbient silence as a communicative spaceHellier stands out by blending folklore, psychology, ritual, and media theory. Matteo emphasizes the importance of this holistic method, which allows the investigators to explore the paranormal not just as spectacle, but as a lived, felt experience.Whether you're a fan of Hellier, curious about the intersection of sound studies and the supernatural, or just love a good mystery, this episode will tune you into a whole new frequency of thought. PROGRAM NOTES Revenant JournalRevenantThe Atmospheric Forteanism of Hellier and the Role of Sound in Recent Practices of Paranormal Investigation : Revenant DVRK:https://www.instagram.com/dvrk_mcr/ DVRK Editions Label:https://dvrkmusic.bandcamp.com/ Videogame stuff:https://yamikurae.itch.io/ Mamuthones Band:https://open.spotify.com/intl-it/artist/0JeuJ0H0Q54p6kTuHJSCIA D∀RK: Dark ∀rts Research Kollective (@dvrk_mcr) • Instagram photos and videos Yami Kurae (@yami_kurae) • Instagram photos and videos Instagram Music and Editing: Daniel P. SheaEnd Production: Stephanie Shea
It's another great mailbag this weeks as we read messages about some Chatabix fans at the Stretford Paddock Man U YouTube channel, an embarrassing quip (and several others too), a query about David's attitude to Joe's MMU massage day, a trio of funny names, Bradley Walsh on The Chase, a Sir Winch-a-lot musical and more littering stories. FOR ALL THINGS CHATABIX'Y FOLLOW/SUBSCRIBE/CONTACT: YouTube: https://www.youtube.com/@chatabixpodcast Insta: https://www.instagram.com/chatabixpodcast/ TikTok: https://www.tiktok.com/@chatabix Twitter: https://twitter.com/chatabix1 Patreon: https://www.patreon.com/chatabix Merch: https://chatabixshop.com/ Contact us: chatabix@yahoo.com Learn more about your ad choices. Visit megaphone.fm/adchoices
In episode 291 of The Just Checking In Podcast we checked in with Leigh Edwards. Leigh Edwards works as a QA Analyst for a housing association and helps lead on the community work that men's mental health charity Mandem Meet Up (MMU) carries out in Wolverhampton. Leigh was raised by a single mum in a difficult neighbourhood of Wolverhampton, but achieved the grades to study at university. He dropped out in his second year but views it as the best decision he's ever made. Leigh's mental health difficulties started when he was 26 years old. He suffered with auditory psychosis for eight months but never told anyone about it. It got to the point where he became suicidal and he planned to take his own life on Mother's Day in 2019. His plan was to go round to see family and ‘say goodbye' for the final time. However, his mum had the most important mother's instinct of her life and wouldn't let him leave the house as she knew something was wrong. After forcing the issue through his defences, Leigh disclosed to her how he was feeling, broke down and agreed for his family to take him to their local hospital. After psychiatric nurses came to his house for a few weeks and checked in on him, he felt like it was no longer helping him and decided to take ownership of his mental health and life. He started doing meditation, discovered the work of a philosopher called Alan Watts and the concept of Taoism, jumped down a rabbit hole and never came out. In this episode we discuss Leigh's upbringing, the psychosis, suicidality and recovery. We then discuss the grief of losing his best friend Ryan who had helped him through that period just a year afterwards in 2020, who tragically took his own life. We also discuss the impact that David Goggins had on Leigh's recovery and how it led him to taking his fitness journey to another level, starting with one half-marathon in memory of his friend, before running two ultra-marathons for himself. Leigh's advocacy work began in December 2022, a month before MMU Wolverhampton was formed. He took part in a monthly hike in Snowden with MMU and in January 2023 – the charity's Founder and previous JCIP guest Jamie Dennis gave the green light to start Mandem Wolves alongside his co-founder Christian. MMU Wolves work continued with an outreach programme for homeless people in Wolverhampton town centre, which takes place once-a-month and has been running for two years. We discuss this community work, running MMU football and his plans to turn it into a fully-fledged team. We finish by discussing his next project to run 10 marathons in 10 days to fundraise for MMU, which at time of recording he was preparing for, but has now successfully completed in June 2025. As always, #itsokaytovent You can donate to Leigh's campaign here: https://www.gofundme.com/f/maintain-our-free-services-to-the-men-of-our-community?attribution_id=sl%3A9af31a4f-3b67-434d-a6b6-6741e9a4d3a3 You can follow Mandem Meet Up Wolves on social media below: Instagram: https://www.instagram.com/mandemmeetup_wolves/ You can follow Mandem Meet Up's main social media channels below: Instagram: https://www.instagram.com/mandemmeetup/ TikTok: https://www.tiktok.com/@mandemmeetup?lang=en-GB You can also follow Leigh on social media below: Instagram: https://www.instagram.com/leigh_edwardss/ Support Us: Patreon: www.patreon.com/venthelpuk PayPal: paypal.me/freddiec1994?country.x=GB&locale.x=en_GB Merchandise: www.redbubble.com/people/VentUK/shop Music: @patawawa - Strange: www.youtube.com/watch?v=d70wfeJSEvk
It was David's 51st Birthday recently, so Joe kicks things off by finding out what he got up to and what presents he was given. It turns out that one gift was a shirt, which was a little too ‘Woodcuttery' for David's liking. So the chat soon turns to buying clothes, for both themselves and their other halves, which all gets a little heated. They also talk about discomfort when rummaging in clothes shops, being bad at small-talk, David looking into moving again and some listener feedback on the MMU's. FOR ALL THINGS CHATABIX'Y FOLLOW/SUBSCRIBE/CONTACT: YouTube: https://www.youtube.com/@chatabixpodcast Twitter: https://twitter.com/chatabix1 Insta: https://www.instagram.com/chatabixpodcast/ Patreon: https://www.patreon.com/chatabix Merch: https://chatabixshop.com/ Contact us: chatabix@yahoo.com Learn more about your ad choices. Visit podcastchoices.com/adchoices
This is the Catchup on 3 Things by The Indian Express and I'm Flora Swain.Today is the 10th of April and here are today's headlines.China Pushes Back Against U.S. Tariffs, Warns of ConsequencesChina hit back sharply at Washington's escalating trade war rhetoric, saying it does not seek conflict but won't tolerate bullying either. Responding to the U.S. decision to raise tariffs on Chinese goods to 125% while pausing tariffs for other nations, Foreign Ministry spokesperson Lin Jian said at a press briefing, “This cause will not win popular support and will end in failure.” Lin emphasized that Beijing will defend its people's rights, signaling that retaliatory action may still be on the table. Meanwhile, Asian markets surged on news of the 90-day tariff pause for other countries, with Japan's Nikkei 225 soaring 8%, South Korea's Kospi rising over 5%, and Australia's ASX 200 up 5% in early trading.India Steers Clear of U.S. Tariff Clash, Eyes Fall Trade PactIndia responded cautiously as U.S. President Donald Trump announced a temporary suspension of his sweeping reciprocal tariffs, which went into effect Wednesday. Just hours before the announcement, External Affairs Minister S. Jaishankar confirmed that India is actively engaging with Washington to finalize a bilateral trade agreement by the fall. Speaking at the News18 Rising Bharat Summit, Jaishankar avoided directly commenting on Trump's controversial statements about trade partners, saying only, “We've been constructive in our engagement, and so have they.” India appears to be walking a fine line—avoiding confrontation while quietly working to secure a stable trade relationship.Tahawwur Rana Extradited from U.S., Special Prosecutor AppointedIndia has taken a key step toward justice in the 26/11 Mumbai terror attacks case. The Ministry of Home Affairs on Wednesday night appointed a special public prosecutor for a three-year term to lead the prosecution of Tahawwur Rana, who is being extradited from the United States. Sources confirmed that a senior team from the National Investigation Agency (NIA) and intelligence services has taken custody of Rana, who is expected to arrive in Delhi by Thursday. Rana is accused of aiding the planning of the deadly 2008 attacks in Mumbai, which left more than 160 people dead.Kashmir Cleric Says Police Blocked Religious Meet Over Waqf ActMirwaiz Umar Farooq, the prominent religious leader and head of the Muttahida Majlis Ulema (MMU), accused Jammu and Kashmir police of halting a planned meeting of clerics at his Srinagar residence. The gathering was meant to discuss concerns over the Waqf Act, which governs religious endowments in the region. Calling the police action unjust, Mirwaiz said religious leaders must be allowed to deliberate peacefully. He added that a joint resolution would be read in mosques across the Valley on Friday. The MMU also pledged support to the All India Muslim Personal Law Board's legal challenge to the Act.Israeli Airstrike Kills 23 in Gaza as Conflict DeepensA deadly Israeli airstrike hit a residential building in northern Gaza's Shijaiyah neighborhood on Wednesday, killing at least 23 people, including eight women and eight children, according to officials at Al-Ahly Hospital. The Gaza Health Ministry confirmed the toll and said rescue teams were still searching through rubble for survivors. Nearby buildings were also damaged, according to Gaza's civil defense, which operates under the Hamas-run government. The strike is the latest in a wave of intensifying attacks, as the humanitarian crisis worsens in the besieged Palestinian enclave with no signs of a ceasefire in sight.That's all for today. This was the CatchUp on 3 Things by The Indian Express.
Episode 481 avec Sébastien S. et Xavier.• Quand les robots sont expressifs (00:01:49) : Apple publie un papier de recherche sur les mouvements des robots. (Sources : youtu.be, techcrunch.com et arxiv.org) • S.A.M.: Quand impression 3D rime avec sécurité (00:10:01) : Une société française met au point un token infalsifiable imprimé en 3D. (Sources : francetvinfo.fr et signaturesam.com) • Quand l'IA entre dans nos vies sentimentales (00:16:04) : Les apps de rencontres se réinventent avec l'IA générative. (Sources : techcrunch.com et theconversation.com) • Le Bloc-Notes (00:26:11) : Quelques infos en bref. • Anniversaire: les 40 ans d'une sortie en MMU (00:32:41) : Il y a 40 ans, un astronaute s'éloignait sans cordon de sa navette spatiale. (Sources : sciencepost.fr et nasa.gov) • Quand le réseau préserve la tracabilité des images (00:40:09) : Cloudflare préserve les informations de provenance dans les images qu'ils servent . (Sources : cloudflare.com, contentcredentials.org et c2pa.org) • Le TRIP à la mode pour sécuriser les votes en lignes. (00:47:23) : Des chercheurs testent la génération de fausses identités à l'aide de QR codes. (Source : sciencesetavenir.fr) Retrouvez toutes nos informations, liens, versions du podcast via notre site Abonnez-vous à notre infolettre afin d'être informé de notre veille technologique de la semaine et de la parution de nos épisodes
This week, we're talking to the authors of a new book about spaceflight called "Star Bound: A Beginner's Guide to the American Space Program, from Goddard's Rockets to Goldilocks Planets and Everything in Between," Emily Carney and Bruce McCandless III. Emily started the popular Facebook group Space Hipsters, now 66,000 members strong, and Bruce is a retired lawyer and space enthusiast who also happens to be the son of Bruce McCandless II, the NASA astronaut who flew on the shuttle and pioneered the use of the Manned Maneuvering Unit. We're going to cover a lot of territory in this one, so take your hand off the eject lever and strap in! Get "Star Bound" (Amazon Affiliate): https://amzn.to/4hvHtXo Headlines - Trump's Mars Vision: The administration's push for a crewed Mars mission by 2029 sparks debate. Tariq notes Elon Musk's visible enthusiasm, while Rod highlights the technical and political hurdles. - NASA Leadership Shuffle: Janet Petro named interim NASA administrator, bypassing Jim Free. The move might signal potential shifts in Artemis priorities. - DEI Rollbacks: Executive orders halt NASA's diversity initiatives, sparking workforce concerns. - SpaceX Milestones: 400th Falcon 9 landing celebrated, with 60 Starlink satellites launched in a week. ULA's Vulcan launch remains delayed. - Meteorite Doorbell Footage: A meteorite impact in Canada, captured on camera, stuns scientists and homeowners. - Quirky Moon Naming: IAU dubs a quasi-moon "Cardea" after the Roman goddess of door hinges. Main Topic: Star Bound - Book Overview: A cultural history of the U.S. space program, connecting missions like Mercury, Gemini, Apollo, and Skylab to societal shifts (e.g., civil rights movements). Authors Emily Carney and Bruce McCandless III emphasized accessibility, avoiding "engineer-speak." - Skylab's Legacy: Emily's passion shines as she details Skylab's role as a bridge between Apollo and the Shuttle, citing the groundbreaking science performed and how it may help us send humans to Mars. - MMU & Bruce McCandless II: Bruce shares stories of his father's iconic untethered flight with the Manned Maneuvering Unit (MMU), suggesting that future missions may revive jetpack tech for tourism and repairs. - Shuttle Era Love/Hate: Both guests defend the Shuttle's cultural impact (e.g., Judy Resnik's inspiring legacy) while acknowledging its flaws. - Conspiracy Corner: The duo laughs over wild theories (STS-1 being flown by clones; Neil Armstrong being a robot) and praises Rod's 2016 book "Amazing Stories of the Space Age" for documenting Project Orion's nuclear explosive propulsion tech. - Future of Space: The book ends at today's "precipice"—Artemis delays, Mars hype, and private ventures. Bruce predicts jetpacks and hotels; Emily urges newcomers to embrace space history's messy, human side. Don't Miss: - Emily's Space Hipsters Facebook group for lively space discussions. - Bruce's website (brucemccandless.com) with book sources and WWII project teasers. Hosts: Rod Pyle and Tariq Malik Guests: Emily Carney and Bruce McCandless III Download or subscribe to This Week in Space at https://twit.tv/shows/this-week-in-space. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit
Send Me A Message! What's the biggest challenge you face when it comes to looking, feeling, and performing your best? Let me know and I'll show you how to MMU in an upcoming epsiode.
Send Me A Message! What's the biggest challenge you face when it comes to looking, feeling, and performing your best? Let me know and I'll show you how to MMU in an upcoming epsiode.
Send Me A Message! What's the biggest challenge you face when it comes to looking, feeling, and performing your best? Let me know and I'll show you how to MMU in an upcoming epsiode.
This Episodes Questions: Hello, I'm a dedicated listener of your podcast and I have a question that's been on my mind regarding 3D printing materials. Nowadays, PETG is available at a price point similar to PLA, sparking a debate among enthusiasts. Many argue that PETG is superior in every aspect, rendering the use of PLA obsolete. However, my personal experiments, corroborated by various online sources, indicate that PLA exhibits greater tensile strength and, in some cases, enhanced impact resistance—especially PLA+. This does seem to vary across different brands. While PETG undoubtedly outperforms in terms of temperature resilience and chemical stability, its reputed strength superiority is something I find quite perplexing. My theory is that PETG's significant deformation under gradual stress, as opposed to PLA's sudden failure under higher loads, might be influencing this common perception. I would love to hear your thoughts on this topic. Thank you for the consistently excellent content! Kelly Hey guys love what you do, I'm listening to the “Worn Nozzle” Podcast. You mention the 3D Chameleon multi filament changer. You guys should try to get him on the show, https://www.3dchameleon.com/ . The inventor is Bill Steele (I may have misspelled the last name) and if you will look at the dates of releases, Bill's invention predates all the multi filament changers even Prusa's MMU, Bambu's AMS, etc. He is up to version MK4 and this unit is amazing. Don't dismiss it as a gimmick, it works better than all these others out there including the new “enraged rabbit carrot feeder”. Chris
EP033 - If shame is the darkest demon we will ever have to battle as men, what can we do to keep that motherfucker from pulling us down and holding us back from stepping into the truth of Who We Are?“Pain shared is pain divided. Success shared is success multiplied.” - Coach Sam Falsafi, Wake Up WarriorIn today's PodWod, we be crushing our RELATIONSHIP Health & Fitness.Inside the gym, pain is pain. The pain I feel from doing Burpees is the same pain you feel. So then how come I love them while most people hate them? Am I crazy or is the rest of the world missing something? A little bit of both, I'd say.Anyone consistent in the fitness game knows that the sooner you develop a healthy relationship with pain, the better. Outside the gym, how we perceive pain and whether we invite it or reject it will determine how strong or how weak we become, and is there any greater or more consistent source of pain than shame?
Discover the best place to find the best friends. This strategy not only ensures that you create a supportive friend netwok, but that you operate in your highest purpose of expanding God's kingdom. What You'll Learn In This Series: The 5 reasons we ISOLATE as moms and how to overcome them. How MOM BRAIN stands in the way of friendships, and how to wire your brain for relationships. What to do when you're just TOO TIRED for friendships! How to know when it's time to weed your FRIEND GARDEN. How to know when you have OUTGROWN a friend, and what to do about it. The 4 REAL reasons we make friends and how to put these to work in your life. The 7 NO-FAIL ways to make friends (the ones you actually want to have). Hannah's 5-step method to make friends in a world where people isolate. The 6-step “Friend Litmus Test” to see if a friend is a REAL friend. The exciting opportunity to make friends inside MMU as a leader! And so much more...
Welcome to this week's Unbelievable? Podcast!
Pulling from a historic classic, find out the 6 ways to win friends and influence people. Hannah also includes her number 1, no-fail technique to make friends, but not just any friends... the right friends! What You Will Learn In This Series: • The 5 reasons we ISOLATE as moms and how to overcome them. • How MOM BRAIN stands in the way of friendships, and how to wire your brain for relationships. • What to do when you're just TOO TIRED for friendships! • How to know when it's time to weed your FRIEND GARDEN. • How to know when you have OUTGROWN a friend, and what to do about it. • The 4 REAL reasons we make friends and how to put these to work in your life. • The 7 NO-FAIL ways to make friends (the ones you actually want to have). • Hannah's 5-step method to make friends in a world where people isolate. • The 6-step “Friend Litmus Test” to see if a friend is a REAL friend. • The exciting opportunity to make friends inside MMU as a leader! • And so much more...
How Do YOU MMU? Shoot me a message
How Do YOU MMU? Shoot me a message
How Do YOU MMU? Shoot me a message
How Do YOU MMU? Shoot me a message
Friends are easy, right? We have them as children, through high school, as we start out in life. But something happens when we become moms. All of a sudden, making friends becomes difficult and we have a tendency to isolate. Discover the 6 reasons we isolate so we can face them head-on. What You Will Learn In This Series: The 5 reasons we ISOLATE as moms and how to overcome them. How MOM BRAIN stands in the way of friendships, and how to wire your brain for relationships. What to do when you're just TOO TIRED for friendships! How to know when it's time to weed your FRIEND GARDEN. How to know when you have OUTGROWN a friend, and what to do about it. The 4 REAL reasons we make friends and how to put these to work in your life. The 7 NO-FAIL ways to make friends (the ones you actually want to have). Hannah's 5-step method to make friends in a world where people isolate. The 6-step “Friend Litmus Test” to see if a friend is a REAL friend. The exciting opportunity to make friends inside MMU as a leader! And so much more...
Some friends are for keeps, some are just for a season. How do you know when it's time to do some weeding in your friend garden? Find out the 4 reasons we have friends, and how to know when it's time to move on. What You Will Learn In This Series: The 5 reasons we ISOLATE as moms and how to overcome them. How MOM BRAIN stands in the way of friendships, and how to wire your brain for relationships. What to do when you're just TOO TIRED for friendships! How to know when it's time to weed your FRIEND GARDEN. How to know when you have OUTGROWN a friend, and what to do about it. The 4 REAL reasons we make friends and how to put these to work in your life. The 7 NO-FAIL ways to make friends (the ones you actually want to have). Hannah's 5-step method to make friends in a world where people isolate. The 6-step “Friend Litmus Test” to see if a friend is a REAL friend. The exciting opportunity to make friends inside MMU as a leader! And so much more...
How Do YOU MMU?
How Do YOU MMU?
EP016 - MMU Question of the Day: Would you rather waste time figuring out how to manage your stress or eliminate as much of it as possible? If eliminating stress is the key to experiencing more joy, what is one of the fastest ways to eliminate much of the stress we experience from our jobs, our relationships, and all the everyday shit we have to do so that we no longer have to figure out how to manage our stress?
In Part 3, we're moving from the joy in the family to joy in your heart. As a Master Mom, everything begins and ends with you. Discover the 4 principles that will help you create and nurture joy in your life, as well as the “5 Joy Reminders.” The key 3-part strategy that will help you manage your home better, while you are creating more joy. The free tool inside MMU that will help you organize your life (and have more fun). The key meeting you must have with your family, that could totally change their future. How to make mealtime easier and more joyful, and the secret is right under your nose! The 7 House Rules you must put in place today to create a joyful family tomorrow. The key principle that can help you move past a fear of failure and--finally--create success. The effects of “Totem Pole Syndrome” and how to know if you have it. The real reason you are not experiencing enough joy and how to tap into it in seconds. The 5 Joy Reminders and how to make them work with your subconscious brain. The secret method of tapping into peace while also allowing the release of joy. The “behind the scenes” enemy that every mom has to stop joy (and how to overcome it).
In Part 2, we're moving from joy in the home to joy in the family. Your family is going to thrive during this part as we we will be focusing on unity and the “7 House Rules”. The key 3-part strategy that will help you manage your home better, while you are creating more joy. The free tool inside MMU that will help you organize your life (and have more fun). The key meeting you must have with your family, that could totally change their future. How to make mealtime easier and more joyful, and the secret is right under your nose! The 7 House Rules you must put in place today to create a joyful family tomorrow. The key principle that can help you move past a fear of failure and--finally--create success. The effects of “Totem Pole Syndrome” and how to know if you have it. The real reason you are not experiencing enough joy and how to tap into it in seconds. The 5 Joy Reminders and how to make them work with your subconscious brain. The secret method of tapping into peace while also allowing the release of joy. The “behind the scenes” enemy that every mom has to stop joy (and how to overcome it).
This month, we're joined by Dr. Jillian Weber, National Homeless Patient Aligned Care Team (HPACT) Program Manager, and Dr. Aayshah Muneerah from the VA Oklahoma City Healthcare System to learn more about VA's new mobile medical units (MMU) for Veterans. Our guests talk about what HPACT does, what MMUs are, and how both help bring medical care services directly to homeless Veterans living out on the street.Veterans who are homeless or at imminent risk of homelessness are strongly encouraged to contact the National Call Center for Homeless Veterans at (877) 4AID-VET (877-424-3838) for assistance.Closed Caption Transcript is available at: https://www.sharedfedtraining.org/Podcasts/EVH_S1EP26.pdf===============================Find your nearest VA: https://www.va.gov/find-locationsLearn more about VA resources to help homeless Veterans: https://www.va.gov/homelessListen to our very first episode of Ending Veteran Homelessness featuring Dr. Jillian Weber: https://www.spreaker.com/episode/s1ep1-caring-for-the-health-of-homeless-veterans-during-covid-19--49740594Read our press release announcing the launch of VA's new mobile medical units: https://news.va.gov/press-room/va-launches-mobile-medical-units-to-increase-access-to-health-care-for-homeless-veterans/Read about, and view pictures of, the MMU Oklahoma City, OK: https://www.va.gov/oklahoma-city-health-care/stories/oklahoma-city-vas-big-blue-van-hits-the-road-delivering-care-to-unhoused-veterans/Read about, and view pictures of, the MMU Birmingham, AL: https://news.va.gov/128877/birmingham-va-launches-mobile-medical-unit/
In Part 1, we will be traveling through each zone of your home and discovering techniques to fill it with joy. If you aren't already zoning your home, get ready for life to change. The key 3-part strategy that will help you manage your home better, while you are creating more joy. The free tool inside MMU that will help you organize your life (and have more fun). The key meeting you must have with your family, that could totally change their future. How to make mealtime easier and more joyful, and the secret is right under your nose! The 7 House Rules you must put in place today to create a joyful family tomorrow. The key principle that can help you move past a fear of failure and--finally--create success. The effects of “Totem Pole Syndrome” and how to know if you have it. The real reason you are not experiencing enough joy and how to tap into it in seconds. The 5 Joy Reminders and how to make them work with your subconscious brain. The secret method of tapping into peace while also allowing the release of joy. The “behind the scenes” enemy that every mom has to stop joy (and how to overcome it).
Episode 14 "On the Road to AMPS Nats... “ As you are listening, the guys should all be converging on South Bend! John Charvat gets us even more hyped up for NATS! We discuss his history with AMPS, the past shows and most importantly, this year's show! The guys also discuss their new purchases and projects on their benches. This did get recorded a few weeks back, so you may already have seen some of these builds shared as completed by now; Riv also comes at Frankie D about his choice of apparel, while trying to explain to the rest of us what the hell “Oaktag” is... Frankie D talks about the Moosaroo Cup, and his entry. (of course, by now we know the results of this contest...). The kit was a teenie-tiny little 1/72 scale Russian/Ukranian Ural truck. Group Build updates: The Red Beach One Studios World War One group build is ongoing, make sure to check that one out if you're interested in WWI subjects. Justin completed his German Pioneer figure, Rob Adams is working on his Mk V Tank (which he has since completed-cant wait to see this thing at NATS!) 48 in 48 M4H group build- this was a fun time, and has since been completed. Machinen Krieger Group build is well underway! Officially ends in July, with the release of the 1/35 Falke kit - post your unstarted builds in the ‘Unstarted ‘ Album. New Kits In The News Frankie D –1/20 Grosserhund ‘Maskenball' Ma.k., Heller Panhard Armored Car 178 AMD 35 new tool, Border Models, 251-1 D Halftrack in 1/35 scale Rob Adams-Nothing jumped out at Rob this episode-He “Dont got nuthin'” Tuch-Miniart 1/72 Stug, M3 Stuart 35th scale, 48th scale Scaffolding set, 1/48th scale wooden pallets, 35th scale Bakery Van, Takom, Stug 3 G Blitz Kit, storage and equipment set, Mk 45 Model 4 5in Naval gun 1/35, Don2N Pillbox Antimissile defense system Justin-1/72 Special Hobby Westland Whirlwind, Italeri EF-111 Raven, SBS Model 48th Macchi M33 Project Upcoming Shows AMPS Nats - April 11-13- Southbend IndianaHome Page - Armor Modeling & Preservation Society (amps-armor.org) MFCA - May 3-4 Feasterville-Trevose PaMFCA Annual Show | Miniature Figure Collectors of America (mfcaclub.com) Can-Am Con 2024 - Saturday April 20th Williston National Gard Armory in suburbs of Berlington Vt. CAN/AM Con '24 | IPMS/USA Home Page (ipmsusa.org) ModelMania 24 - Saturday May 4thStafford Centre, 10505 Cash Road, Stafford, TX 77477https://ipms-houstontx.org/Social Media Shout Outs Rob Adams- Craig Flynn (Flynnburger) Bandai 72nd scale Y Wing Frankie D-Robert Czulda and his 1/72 Vietnam Era Mule w/ recoilless rifle Bad Santa - Bob Smith and his Hetzer 38t Late Tuch- Bill Huffman and his WWI trench Diorama Rob Riv - Sheldon Mcleese's JS2 in MMU and on the Insanity page-Tamiya Kit with upgrades Links Mount Mansfield Scale ModelersMount Mansfield Scale Modelers | Facebook Rob Riv's Modeling Insanity Facebook page https://www.facebook.com/robrivsmodelinginsanity?mibextid=AEUHqQ Ryan's Random Modelwerks on Facebook Opening and end music by Supernova by Arthur Vyncke https://soundcloud.com/arthurvostMusic promoted by http://www.free-stock-music.comJoin the Podcast on Facebook on The Modeling Insanity Podcast PageEmail the Insanity Crew at modelinginsanitypodcast@gmail.com for any comments or suggestions.
Fragmentation is a very interesting topic to me, especially when it comes to memory. While virtually memory does solve external fragmentation (you can still allocate logically contiguous memory in non-contiguous physical memory) it does however introduce performance delays as we jump all over the physical memory to read what appears to us for example as contiguous array in virtual memory. You see, DDR RAM consists of banks, rows and columns. Each row has around 1024 columns and each column has 64 bits which makes a row around 8kib. The cost of accessing the RAM is the cost of “opening” a row and all its columns (around 50-100 ns) once the row is opened all the columns are opened and the 8 kib is cached in the row buffer in the RAM. The CPU can ask for an address and transfer 64 bytes at a time (called bursts) so if the CPU (or the MMU to be exact) asks for the next 64 bytes next to it, it comes at no cost because the entire row is cached in the RAM. However if the CPU sends a different address in a different row the old row must be closed and a new row should be opened taking an additional 50 ns hit. So spatial access of bytes ensures efficiency, So fragmentation does hurt performance if the data you are accessing are not contiguous in physical memory (of course it doesn't matter if it is contiguous in virtual memory). This kind of remind me of the old days of HDD and how the disk needle physically travels across the disk to read one file which prompted the need of “defragmentation” , although RAM access (and SSD NAND for that matter) isn't as bad. Moreover, virtual memory introduces internal fragmentation because of the use of fixed-size blocks (called pages and often 4kib in size), and those are mapped to frames in physical memory. So if you want to allocate a 32bit integer (4 bytes) you get a 4 kib worth of memory, leaving a whopping 4092 allocated for the process but unused, which cannot be used by the OS. These little pockets of memory can add up as many processes. Another reason developers should take care when allocating memory for efficiency.