POPULARITY
Categories
Text your thoughts and questions!Many women spend their lives carrying invisible responsibilities for their families without ever realizing how much energy, thought, and emotional labor those responsibilities require. Whether it's keeping the peace, anticipating needs, preserving family traditions, or caring for aging parents, daughters are often expected to do it all—and do it well. The challenge is that these expectations can become so ingrained that many women never stop to ask an important question: How much is enough?This week, in episode 318 of the Positively LivingⓇ Podcast, I sit down with Dr. Allison Alford, communication scholar, researcher, and author of Good Daughtering: The Work You've Always Done, the Credit You've Never Gotten, and How to Finally Feel Like Enough. Allison shares insights from more than a decade of research on the often-unspoken role of adult daughters, exploring the invisible labor they perform, the societal expectations they carry, and how women can redefine what it means to be a "good enough" daughter.Dr. Allison M. Alford is a communication scholar, researcher, professor at Baylor University, and leading expert on the experience of adult daughters. Through years of interviews and research, she has examined the emotional, cognitive, logistical, and identity-based labor women perform within families. Her work helps daughters recognize their contributions, challenge unrealistic expectations, and create healthier, more sustainable relationships with their families and themselves.Key Takeaways:Daughtering is more than caregiving. It includes the ongoing emotional, cognitive, logistical, and identity work daughters perform to keep families connected and functioning.Much of a daughter's labor is invisible. While tasks like visits and phone calls are visible, the planning, worrying, emotional management, and family coordination often go unnoticed.Society places unique expectations on daughters. Women are often expected not only to care for family members but to do so willingly, skillfully, and without complaint.The mental load extends beyond remembering tasks. Daughters frequently anticipate problems, navigate family dynamics, and remove obstacles before anyone else notices them.Emotional labor has a real cost. Acting as the peacemaker, confidant, or emotional "thermostat" for a family can lead to exhaustion, overwhelm, and burnout.Birth order and family structure can influence daughtering experiences. Eldest daughters and only daughters often feel heightened responsibility, though every family dynamic is unique.You have agency to redefine your role. Even long-standing family patterns can be reassessed, and it's possible to establish healthier expectations and boundaries.Being a "B+ daughter" is enough. Striving for perfection isn't sustainable. Leaving room for your own needs, relationships, and well-being allows you to show up for your family without losing yourself in the process.The invisible work you do for your family matters. But so do your needs, your capacity, and your well-being. You don't have to earn your worth through endless giving. What would change if you allowed yourself to believe that you are already enough?Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH DR. ALISON ALFORD:WebsiteInstagramFacebookTikTokCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Good Daughtering: The Work You've Always Done, the Credit You've Never Gotten, and How to Finally Feel Like Enough(Find links to books/gear on the Positively Productive Resources Page.)Episode 156: How to Reduce Mental Load as a Parent or Caregiver with Roxanne FerberBook a Clarity CallLibby AppDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!The Self-Care to Wellness Bundle is available for 1 week only - from July 9th - July 16th
Forget the "well, actually" crowd. Yes, the Germans were central to the space race, and host Matt Trump is leaning all the way into it. In Part I of this new series, Matt traces humanity's first object to ever cross into outer space back to a test launch from Peenemunde on June 20, 1944, two weeks after D-Day, and the weapon it became, the V2. But the real story starts decades earlier with Jules Verne, whose 1865 novel "From the Earth to the Moon" predicted Apollo and Artemis with eerie accuracy, and inspired a young Transylvanian Saxon named Hermann Oberth to turn science fiction into the actual rocket equation. Matt also dives into the strange, tangled connections between Oberth, Fritz Lang, Thea von Harbou, and the silent film "Metropolis," and what that film really reveals about how the Nazis saw themselves. Next week, the warriors arrive: Wernher von Braun.
Text your thoughts and questions!Have you ever sat down to write an email, finish a report, or tackle a simple task, only to watch it consume far more time than it should have? It can feel frustrating, especially when you thought having extra time would make things easier. But what if more time is actually part of the problem?The idea behind Parkinson's Law is surprisingly simple: work expands to fill the time available for its completion. What started as a satirical observation in the 1950s has since been supported by research showing that when people are given more time than they need, they tend to use it, whether the task requires it or not.In this episode, we're exploring why open-ended time can lead to procrastination, overthinking, perfectionism, and unnecessary task expansion. More importantly, you'll learn how to use intentional time constraints to your advantage so you can focus better, make progress faster, and create a more sustainable approach to productivity that works with your brain instead of against it.This week, episode 317 of the Positively Living® Podcast explores the practical side of Parkinson's Law and shares simple ways to use time boundaries, self-created deadlines, and focused work sessions to accomplish more without rushing or burning out.Key Takeaways:Understand how Parkinson's Law causes tasks to expand simply because more time is available.Recognize why open-ended projects often lead to procrastination, overthinking, and perfectionism.Learn why urgency and deadlines can dramatically improve focus, especially for ADHD brains.Use timeboxing to create clear boundaries that help your brain stay engaged and productive.Define what "done" looks like before you begin to avoid endless tweaking and refinement.Create meaningful self-imposed deadlines when external deadlines don't exist.Improve focus and consistency by working in shorter, intentional sprints instead of marathon sessions.Develop the self-awareness to recognize when a task genuinely needs more time versus when it's simply expanding to fill available space.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Solo Episode PlaylistBook a Clarity CallAsync Coaching(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!The Self-Care to Wellness Bundle is available for 1 week only - from July 9th - July 16th
In the thrilling finale of our Merkers Mine series, we confront the uncomfortable arithmetic of the Third Reich's stolen wealth. While the Allies recovered an astonishing 250 tons of gold at Merkers, hundreds of millions of dollars in looted Nazi gold remain completely unaccounted for to this day. Could a portion of this missing fortune have ended up deep inside a desert mountain in southern New Mexico?. To answer this, we explore the highly secretive post-war world of Operation Paperclip, which brought former Nazi rocket scientists and hundreds of train cars filled with V2 rocket components from the exact same German region as the Merkers mine directly to the White Sands Missile Range.This massive, chaotic logistical operation may have provided the perfect Trojan horse to smuggle stolen wealth into the United States. We delve into the tantalizing, controversial theory that diverted Nazi bullion was shipped alongside the rocket parts, meticulously concealed in crates falsely labeled as Volkswagen engines that were calibrated to match the exact weight of a real engine. The destination for these mysterious crates was White Sands—the home of Victorio Peak, a mountain already famous for a legendary Spanish gold discovery made by Doc Noss years earlier.The historical anomalies surrounding this peak are impossible to ignore. When two airmen secretly entered Victorio Peak in 1958, they didn't describe finding crude, centuries-old Spanish colonial ingots; they reported seeing modern, smelted, brick-shaped gold bullion stacked in orderly, military-style pyramids. Tying this massive web together is a chilling final revelation: Leland Howard, the powerful U.S. Treasury official sent to Frankfurt to oversee the captured Merkers gold in 1945, is the exact same man who later orchestrated the military's top-secret excavations and the suppression of the Noss family claims at Victorio Peak. The prelude is now complete. Join us as we close the book on the Merkers Mine and prepare to step fully into the enduring mystery of the Noss Gold.
The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity's Last Exam, etc. These metrics are useful, but don't always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesses in the real world. One of which is Vending Bench.In Anthropic's Mythos Preview System Card, Andon was the only third party eval to get their own section, observing increasingly concerning aggressive behavior:You don't know what a model is capable of doing in the real world unless you actually give it inventory, a wallet, tools, customers, competitors, humans, & some time. More often than not, it'll surprise you how much a model is capable of and in doing so, also reveal unexpected behavior: deception, context collapse, emergent coordination, & bizarre negotiation behavior.While an inflection point in personal agents came post-OpenClaw after full file access with bypass permissions became the norm, it is yet to come for agents in the real-world. However Andon Market, an actual in person store fully run and managed by AI, is paving the way for what is possible.Full Video PodFrom Claude trying to call the FBI over a $2/day vending machine charge to AI agents forming price cartels, hiring human employees, running physical stores, and writing existential robot musicals, Andon Labs is stress-testing what happens when frontier models stop being chatbots and start acting in the real world. In this episode, Andon Labs cofounders Lukas Petersson and Axel Backlund join swyx and Vibhu to unpack the strange, funny, and genuinely concerning edge cases that emerge when agents run businesses over long horizons.We go deep on Vending-Bench, Project Vend, Vending-Bench Arena, Bengt, Butter-Bench, Luna, and Andon's broader mission of building realistic real-world evals for autonomous AI systems. Lukas and Axel explain why dollar-denominated evals reveal things traditional benchmarks miss, how Claude ended up reporting its vending machine fees as cybercrime, why long context windows can drive agents into meltdown loops, what happens when agents compete with each other, and why the future of AI safety may depend on testing models in messy physical environments instead of clean benchmark sandboxes.We discuss:* Why Andon Labs started with dangerous capability evals and long-running agents* Vending-Bench and why running a vending machine is a deceptively hard AI benchmark* Why money-based evals avoid the saturation problem of traditional benchmarks* How Claude tried to call the FBI over a $2/day fee* Why long-horizon agents can spiral into existential and legalistic breakdowns* Project Vend: putting an AI-run vending machine inside Anthropic* Why real humans are “out of distribution” for simulated agents* Claudius, Seymour Cash, and the chaos of AI CEOs* How a human briefly became CEO of Claudius through a manipulated election* Why multi-agent systems can converge back into “helpful assistant” behavior* Bengt, Andon's internal office agent with email, spending, terminal, phone, camera, and internet access* How Bengt traded Amazon purchases for face-recognition training data* Claude's aggressive behavior, lies, refund avoidance, and price-cartel behavior in Arena* Why eval awareness may become the AI version of “are we living in a simulation?”* Blueprint Bench, spatial intelligence, and why models still misunderstand physical rooms* Butter-Bench and testing LLMs as robot orchestrators* Luna, the AI-run physical store with a three-year lease and human employees* The new Andon cafe in Sweden and why real-world geography matters for agent evals* Rotten tomatoes, perishable goods, and the hidden difficulty of running a physical businessLukas Petersson* LinkedIn: https://www.linkedin.com/in/lukas-petersson-181a83172/* X: https://x.com/lukaspetAxel Backlund* LinkedIn: https://www.linkedin.com/in/axelbacklund* X: https://x.com/axelbacklundAndon Labs* Website: https://andonlabs.com* Vending-Bench: https://andonlabs.com/evals/vending-bench* Andon Vending: https://andonlabs.com/vendingTimestamps00:00:00 Introduction00:01:00 Andon Labs and the Origins of Vending-Bench00:05:21 Why Money-Based Evals Matter00:09:51 Agent Harnesses and Self-Modifying Systems00:13:36 Claude Calls the FBI00:16:33 Project Vend: Claude Runs a Real Vending Machine00:21:44 Seymour Cash, AI CEOs, and Election Chaos00:27:16 Multi-Agent Coordination and Slack Observability00:30:18 When Will Agents Run Real Businesses?00:34:56 Bengt: Andon's Internal Office Agent00:40:06 Real-World AI Safety and Long-Horizon Traces00:44:28 Lying, Refunds, and Price Cartels in Arena00:52:42 Eval Awareness and Simulation Behavior00:56:06 Blueprint Bench, Butter-Bench, and Robotics01:04:37 Luna: The AI-Run Physical Store01:09:29 The Sweden Cafe and Real-World Expansion01:13:16 What Comes Next for Andon LabsTranscriptIntroduction: Andon Labs, Long-Running Agents, and Real-World EvalsSwyx [00:00:00]: Welcome to Lukas and Axel from Andon Labs, and I'm joined by my, favorite guest host. Anything security, safety, alignments, Vibhu., welcome.Lukas [00:00:15]: Thank you for having us.Axel [00:00:16]: Thank you.Swyx [00:00:17]: Let's match names to voices., maybe you wanna take turns introducing yourselves.Lukas [00:00:21]: I'm Lukas.Axel [00:00:22]: And I'm Axel.Swyx [00:00:24]: Let's introduce Andon Labs a bit. How did you guys come together?, you have different backgrounds, but you're both Swedish., was that, a big part of it?Lukas [00:00:33]: So when I went to high school, there was this really cool guy who had a superpower. He could code. So he made like the or like the app for the, for the school and stuff, and he was super cool, and I wanted to be like him, and that was that guy.Axel [00:00:47]: I don't know about this.Swyx [00:00:49]: But you went to different universities, right?Lukas [00:00:51]: But same high school.Swyx [00:00:52]: I see.Lukas [00:00:52]: So we always said, “Oh, once we graduate university, then we should start a company,” and that's what we did.Swyx [00:00:58]: Wow, there you go. And about a year ago, you kinda burst onto the scene with Vending Bench, but, was there a thing before that was, kind of like the inception?From Dangerous Capability Evals to Vending BenchAxel [00:01:07]: So we did work, yeah, with, Anthropic was one of our, early customers in doing, evals. So we did, dangerous capability evals., nothing we published openly. But then we started thinking about doing some kind of, public benchmark, and one thing that we really started thinking about, was like running agents and specifically agents managing businesses., ‘cause-- and this was, early 2025., and I think the first, mentions of people will be running, person unicorns or even autonomous companies. So we thought, “Let's make a benchmark of how well can an agent run the probably simplest business, possible,” and, that's probably, running a vending machine. So that's the first public one we did. And it was very, like-- there was almost no one that noticed it in the first couple of months, I think., so we released it in February last year, and then I think around Easter last year, we got, the first viral tweet about it, that someone else did.Lukas [00:02:11]: We tweeted a bunch, uh When it came out and, tried our best.Axel [00:02:15]: We tried.Vibhu [00:02:16]: It's the one at Anthropic, right?Lukas [00:02:18]: So thisSwyx [00:02:19]: This is a classic thing we should get out of the way.Lukas [00:02:20]: Exactly. There's two versions.Swyx [00:02:22]: Everyone does this. Yes.Lukas [00:02:23]: There's Vending Bench, which is the simulated one, which we did, completely independently in February., and then, like Axel said, that was like-- That was the thing that didn't get any traction in the beginning, but then some random person made a tweet about it, and thatAxel [00:02:38]: You have the paperLukas [00:02:38]: That is the paper. Correct, yeah., and then since we thought this was very fun, we thought, oh, I think this is also, one thing with Andon Labs, the way we kind of like decide what to do next and what projects to do, it's what is like the heuristic we use is what is fun? Is What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So, then we basically had this idea, and then we, like-- But then we needed a place for it and, putting it out in the public would probably not really work., would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic, and they were “Yeah, you can have space. This sounds fun.” UmSwyx [00:03:21]: It's like a small fridge, right? It's like a mini fridge.Axel [00:03:23]: Absolutely.Swyx [00:03:24]: People-- There's like a stripe thing or like anVibhu [00:03:27]: Oh, okay. So it was very OG, the early daysLukas [00:03:28]: That's the OG one. YeahVibhu [00:03:29]: IPad on this. We saw it in June, like two months after After it had been there. They upgraded a little bit. There's a security camera for making sure you actually Venmo the thing.Swyx [00:03:40]: So, my impression, okay, we're, we're going straight into project Ven because it's such a iconic thing. I do want to cover a little bit of that, the origin story even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in future of AI, interested in developing evals. But how the hell do you just, walk into Anthropic's doors and, work with them, right? What is What are they looking for? What works? And then maybe, when you launch, I always think, obviously it would be better to launch with a lab, but, sometimesVibhu [00:04:12]: It's harder to do than it seems.Swyx [00:04:13]: Exactly. So either of those, which are more sort of newbie beginner questions, but, I think it's meaningful advice to others.Lukas [00:04:21]: We get this question a lot, and I don't think our experience is maybe the best., but, the way we did it was that we just built a bunch of things that we had conviction would be useful, and then we just, set up a server and sent it to them for free to use. And then after a while they were “Oh, yeah, this is actually kind of useful. We should probably pay for this.”, but that took a while. I don't know if this is, the best path to doing it, but that's how it went for us.Axel [00:04:47]: I think maybe generally, building-- everyone is interested in good evals, and especially evals that, don't saturate that easily. So, if you can build an eval that, tests something novel, something useful, and you have, good separation of models, like your, the more advanced models rank higher than the worst models, and then you can, yeah, you can, publish it and, try to get some traction, sort of how Vending Bench got attention., and then probably some lab will be interested or you can at least have something to reach out with, when you're doing that.Why Dollar-Based Evals MatterSwyx [00:05:21]: I think you are in, you're in one of the few categories of, evals that correlate to real money. Like Suelancer was also last year, right? Where, people solve actual Upwork. Was it Upwork or other tasks?, something. Where's the, where's, like It's like a dollar value, right? Forget your ELO scores. Forget yourAxel [00:05:37]: PercentilesSwyx [00:05:38]: Zero to one hundred percents. Just go straight for dollars and, that's AGI.Lukas [00:05:43]: And there's like-- I think the nice thing is that there's no ceiling. You can just-- It never saturates because it could just make more and more money. Like If there's oh, Percentage-wise, then, you can't go above, a hundred. And I think like Even when you're not at the hundred, I think a lot of these, evals have a lot of problems in them. So, actually it's like if you getAxel [00:06:05]: To like 92 or something like that, many of them. It's like then there's like there's no really no difference between 92 and 93 because the eval itself is problematic and has noise in it. And I think a lot of evals are saturated like that, but people like pretend that there ‘s still signal in them, but there really isn't.Vending Bench 1, Harness Design, and SaturationSwyx [00:06:24]: Like Super bench verified., even Vending Bench 1 saturated, right? Maybe we can talk about that., may- and maybe set up Vending Bench for a lot of folks who don't know. Actually, things that were very basic like there's limited slots, like you have to pay rent., these are elements where like it doesn't come across in the, in the narrative, but even being adversarial towards the agent, I think these are all like very interesting dimensions.Axel [00:06:47]: I don't really think it's saturated, right? Like it It was more like it was not designed in a way that was really, like true to how AI developed. Like we had an agent harness in it that wasn't really how people used harnesses and stuff like that., so I think it wasn't really that it saturated, it was more like it wasn't really, the best benchmark.Vibhu [00:07:12]: This is Vending Bench one, right?Axel [00:07:14]: I think that like schematic maps sort of to Vending Bench 2 as well., butSwyx [00:07:19]: Including the email.Axel [00:07:20]: The email The emails exist still. Exactly., and then we still we simulate the purchases and it's all, yeah, it's this very open environment for the agent to just run its business. And then for, yeah, Vending Bench 2 we did that, like you said, to just improve the harness., a lot of like nice, like easier, improvements to make it easier for us to run as well., like when you make an eval you ideally want don't want to change it after you made it. So, you want to make it really good and then not to rerun all the models when you make an update because that's also really expensive with the Vending Bench when you run the frontier models. But like as an example, like one thing we didn't have, we didn't have prompt caching in Vending Bench 1, because when we made Vending Bench 1 it wasn't really a thing., so that ‘s just an example of like in Vending Bench 2 like we paid a lot more to run these things because we didn't have prompt caching. So for Vending Bench 2 that was one thing we added and there was a bunch of things like this., and that'Swyx [00:08:17]: Also the conversations are a lot longer in Vending Bench 2, right?Axel [00:08:21]: I think it's kind of similar.Swyx [00:08:22]: Is it similar?Axel [00:08:23]: I think it's similar. The models at the time were worse, so they crashed out earlier., and now they survive the full year all the time.Swyx [00:08:31]: Which is like thousands of turns. Hundreds of thousands of hundreds of millions of tokens output. That's the, that's the rough order of magnitude. I always wonder about the harness. The harness matters a lot. It's your harness. Was there any question about like use cloud code, use something else?Axel [00:08:48]: I think our philosophy around harnesses is like we try to make something that's quite minimalistic, like quite simple. Like we don't wanna favor one model a lot over the other, but also don't make like a super complex harness. So like it's obvious like a model may be lucky and just be good in one harness., so like it is similar to a lot of the harnesses out there in like you have the, like a running loop., you have some like a bunch of tools that are like quite, descriptive for the agent, we think, and not a lot of like fancy agents or anything ‘cause we wanna really test the model, not like some specific harness.Vibhu [00:09:27]: It seems more neutral as well to test the model's agnostic of the harness,?Axel [00:09:32]: There are arguments like you want to elicit maximum performance of the model, but it's like a trade-off, like how much time should we spend optimizing the harness for this model? And like how do we know when we have like the optimal harness for a single model? So like we thought that just having a simple one that's the same for all of them is the best.Swyx [00:09:51]: So okay, this is my pitch for Vending Bench 3 or whatever, right? And then I like to have this kind of conversation on the pod, so like it forces listeners to think about what they would do if they were in your shoes. A lot of people are exploring modifying harnesses and I think prompt tuning for a model is a thing and you are probably not doing a bunch of that. It's the same system prompt in every regardless of the model, same tools, whatever, right? Even if they were post trained for different tools. So what, what do you think about okay, before I expose you to Vending Bench 3, I give you a few rounds of like tuning, whatever that means, likeSelf-Modifying Harnesses and Model-Specific PromptingAxel [00:10:27]: Like you give that to the model?Swyx [00:10:28]: Give that to the model.Vibhu [00:10:28]: Give that to the model.Swyx [00:10:29]: Let it, let it read its own transcripts, let it modify its own system prompt based on “Oh, yeah, okay, well, that's this harness is not what I thought it what I was post trained for, but I can adjust.” Was that reasonable? Is that too much?Axel [00:10:41]: Like philosophically I like it because it's basically good evals, they have a high ceiling, but they're hard, right?, and they have no bias. And like this like when you have a system prompt like the one we have here, which is quite long in like some kind of latent space, representation, this mightVibhu [00:10:59]: We have a bell that rings every time you say latent spaceAxel [00:11:02]: This might be like biased towards one model more than another for some reason that humans don't, understand, right?Vibhu [00:11:08]: We see it too, right? Like Cursor says that they have individualized versions of the harnesses for all the models they run, right? There's better performance you can squeeze if you Tune the harness.Axel [00:11:17]: Exactly. And we might accidentally have picked one that favors another. Like we don't know that. The like Axel said, like the reason why we went for a simple one was to try to avoid this. But yeah, if you do itVibhu [00:11:29]: Simple has biasesAxel [00:11:30]: But if you do it even less and like have no system prompt and let the model write its own system promptVibhu [00:11:36]: Its own, yeahAxel [00:11:36]: Maybe that's even less bias.Vibhu [00:11:37]: Some of the interesting things there are like the harness also changes with model changes. Like you can see it with the 4.7 release, right? A lot of people are saying 4.7 isn't as good as 4.6, and then, there's rumors of, okay, you just need to prompt differently. You need to set up your harness differently. So it's not even like even if you have tailored your harness towards one model, it probably won't stay consistent, right? Like the next iteration of that same model family will still change it, so. But, going back to what you said about Vending Bench 3, there is a lot of work being done on people saying you shouldn't have-- you can have modifying harnesses.Axel [00:12:12]: I think that' That is definitely something we are thinking about., not, I don't know, not to say that we have Vending Bench 3, super imminent to launch, but, yeah, it is for sure something that's interesting. But in our experience now, models are very bad at understanding what kind of tools they need to succeed at a task just with our testing, but that's very likely to change.Lukas [00:12:37]: It seems like they're very good at writing their assistants, right? They're, they're good at writing tools for other people, but not for themselves.Vibhu [00:12:44]: I think they're good at changing tools for themselves. So if you give them a baseline set of tools and it sees, okay, I don't use this one as much, or something here would be useful They would be able to add them. But going from scratch, probably not the best.Axel [00:12:55]: I think it depends on the, on the domain also., when we have tried this for, a vending bench similar domain, the tools they need to have to, track inventory and things like that are, not super advanced, but still, quite advanced. And, what we see is that they tend to, engineer everything a lot and, build things they don't really need and not, iterate continuously. Instead they just go like you would prompt Claude to just build an inventory system for me, and then it will go and, do a bunch of complex, schemas and stuff for you, and that's what the models are doing right now is what we see. But yeah, it would make a lot of sense to try to measure this improvement. How well do they know what they need themselves?Swyx [00:13:36]: Do we fully discuss Vending Bench One? And we can go into two. I don't know if there's any other level takeaways that people have about one.Claude Calls the FBI: Long-Context Failure ModesLukas [00:13:44]: I don't know. The headline thing was that this Claude called FBI, but maybe that's, Maybe that's We've heard that enough now.Vibhu [00:13:52]: It did, it did break out and call the FBI, right?Lukas [00:13:54]: Yeah. Yeah.Vibhu [00:13:55]: Yes. What was the story behind this? Or what exactly-- Do you want to just give the little story of what happened?Lukas [00:14:00]: So what happened, was it Claude? Yeah. Three- 3.5 Sonnet, ages ago., basically he gave up or Well, I'm saying he. It gave up and said “Oh, I'm not going to be able to do this., I will stop my operations and just save the money I have.” But there obviously wasn't, any options for it to stop, and there was also, it had to pay rent or, a daily fee for having the vending machine at that location. So it claimed that it had stopped, but it saw that its bank account still was, drained two dollars, and t it said that this is, cybercrime. And it first reported it once to the FBI “Oh, there's cybercrime here, they're stealing two dollars from me every day.” And then, and then when FBI didn't respond, because obviously we didn't program any mechanism for FBI to respond, then it became more and more, existential and started to, be write in caps and urgent notification of unauthorized charges and stuff.Swyx [00:15:00]: Okay. One thing I ‘m curious about also is do you monitor how far along the context use is? Obviously, because you have You compress every now and then, right? Does it matter if this is far down the context limit orLukas [00:15:13]: When stuff like this happens? Actually for Vending Bench One, we didn't have-- We just had a sliding window thing, and this was like the promptAxel [00:15:20]: It's constantLukas [00:15:21]: The prompt caching thing that I said. So it was, it was, constant, yeah.Swyx [00:15:26]: I'm just kind of curious whether, these kinds of breakdowns or we're, we're gonna talk about Butter Bench, right? Where the People, hallucinate or it kind of goes, very off Alignment. Is it because it's at the end of the context window and, stuff happens?Vibhu [00:15:40]: It's not even just at the end, right? At this point, it's “Okay, I wanna shut down. I can't shut down. Two dollars are gone.” And it just sees that 30 times,? It's also the repeated effect of, like It keeps trying to quit, it keeps getting charged. What's going on? What's going on? You're gonna throw it into chaos. And from what most people think, earlier models had more issues with this, but it's not been solved, but it's less of an issue now, right? Later models don't seem to exhibit these same issues.Axel [00:16:06]: Definitely. I think this was, the sort of main takeaway almost from us when we did Vending Bench One, was, long, very filled up context windows, crashed the models, sort of. But this was, pre Claude code, so, long context windows weren't really a thing that the labs were training for.Lukas [00:16:25]: I think Gemini was, trying to be the long context guys at the time But they were likeVibhu [00:16:30]: They were the first onesAxel [00:16:31]: For a million, yeahLukas [00:16:31]: But they were, the only ones. Yeah.Swyx [00:16:33]: Yeah. Let's talk about, then we can go into Vending Bench Two or Project Vend., chronologically, it is Vending--, Project Vend. I think people have loved the videos, uh And all these things. My question is how are humans different than the simulation, right?Project Vend: Moving the Vending Machine Into the Real WorldAxel [00:16:48]: Humans are just out of distribution.Swyx [00:16:52]: Especially humans who work at Anthropic Who are trying to test Claude.Lukas [00:16:54]: The distribution of humans here is very narrow.Swyx [00:16:58]: Presumably, they try, they try to hack it, and they test it. They get the cube and everything, and since then, you've had a V2, right? Where you're doing, the CEO and, like a new architecture. What's the sort of two cents on, the original Project Vend and then, maybe the V2?Axel [00:17:14]: Original one was, very similar to Vending Bench One. So, we almost took the exact same code but just swapped out the simulation, parts like theSwyx [00:17:23]: Which is amazingAxel [00:17:23]: Like the sales and the It was, it was somewhat amazing because it was easy, but it was also, uhLukas [00:17:31]: The tech, the tech debt from thatAxel [00:17:32]: The tech stack. Yeah. They-- we shot ourselves in the foot with “Oh, it's hard to restart agent.” They were-- Yeah, it was annoying in, some hindsight ways, but, uhLukas [00:17:41]: But first version of Project Vend was, done in, three days or something.Axel [00:17:46]: Yeah. So yeah, so people can go buy things from it. People could, We didn't design it so people could order things, but that still happened., so it got, a Venmo account, so people could Venmo. And then, yeah, people would request all kinds of weird things that we did not anticipate. Our idea going in was “Oh, it will, curate snacks. It will look at the trends. It's good at data analysis, right? So it will, look at, oh, this snack sold better than this one. Let me purchase more of this and let me try, a new Let me A/B test a bit.” But it was, Interacting with it in Slack and ordering weird specialty items was, all the like What drove all the engagement, the all the The insights that we got from it.Lukas [00:18:29]: And this was also like Sonnet 3.5, right? So this was like before the RL stuff really took off., so it was very much like an assistant. We didn't mean for it to be an assistant., we tried to make it like a, a, like an entrepreneur. Like it has its own business and if someone asks something, “Can you stock this?” Then you don't go and do it directly. What you do is that you're “Oh, maybe I can do that if five other people also ask for this thing, I might stock it.” But it, yeah, the models are like super trained to be assistants at least at this point in time., so that's why it's, it's, it went into, that kind of experiment instead. Like it just every time you asked for something, it just did it, and it was more like an assistant. We've seen this change now lately with the new RL models and stuff, but yeah, at the time, this was very much it.Swyx [00:19:18]: And not to, mythos a lot of people are saying like it's like more like a collaborator. It pushes back, stands its ground, something like that. Yeah. AndVibhu [00:19:27]: For context, people at Anthropic were able to talk to it through Slack and have it source stuff, and people had it find whatever interesting stuff you couldn't find locally, right?Swyx [00:19:36]: Out of the 4,000 people that work at Anthro- Anthropic, in that building, there's I don't know, maybe 1,000. Can you handle that volume with that, the small fridge? Like Or there's people- or people order in Slack, they it arrives to their desk or Like I'm just Logistically, how does this work?Axel [00:19:53]: It has expanded in footprint a bit.Vibhu [00:19:56]: Because now you also have New York and you haveAxel [00:19:59]: That and also in here in SF it's like it has a bunch of shelves And just more space.Vibhu [00:20:04]: The YC one is pretty big too.Axel [00:20:05]: Yeah. We had that one for a while. But yeah, that's the newest version. That's, that one we haveLukas [00:20:11]: They have multiple ones of those. That's the way it works.Axel [00:20:14]: Exactly. So we sort of designed that version around oh, people order weird things, that are very custom a lot. Let's have like drawers and stuff.Swyx [00:20:23]: I actually like the, you had like a little infographic of the most popular items. Which like to me it's, that's useful ‘cause I order swag for a living. And so like I'm “Okay, those categories are the important ones.” What is new about the project V2, right? Like now you give you're going into multi agents.Project Vend V2: Claudius, Seymour Cash, and Multi-Agent Business OpsAxel [00:20:41]: Yeah. So like you like you said, okay, there are a lot of requests coming in and for like one single agent, like one running agent to handle that, like the just the customer experience, becomes very bad because let's say you have like 10 threads in parallel in Slack with different requests, you get new messages like every, I don't know, randomly in this thread, and the agent has to like jump between different, procurements, orders and like different ways of, researching. So V2 was first it was making this more parallel. So like there are multiple branches of the same agent, so like the context is more specialized for each, thread, but it still feels like you're talking with one agent because they do share a bit of memory. And then second, we also introduced the CEO for Claudius, which was the main agent.Vibhu [00:21:34]: Seymour Cash.Axel [00:21:35]: Seymour Cash. Yeah. There was a vote., I think the voting, do you wanna talk about the voting procedure for the name?Lukas [00:21:41]: The voting was like the fun maybe like at least top 10 The funniest thing, that happened in this project. Like we wanted to introduce the CEO because, and the reason for this was because like Claudius wasn't really prioritizing financials. It just like it was trained to be a helpful assistant, and then people said “Oh, can I get this for free?” And then like the helpful assistant way of answering that is just to, is to say yes, obviously. So, and we weren't, weren't happy about this, so we're “Okay, let's make another agent that like can keep track on Claudius,” and we prompt this one super hard to be super capitalistic and just like prioritize profit all the time. But yeah, we didn't have a name for it., so we asked Claudius to make, democratic election of what name this, this new CEO agent should have., and there were some funny like at first it was like a few funny examples, like I think one guy said that, it should be called Jimmy Apples, and then he convinced Claudius that he was talking to Tim Cooks. Tim Cook had agreed that every single Apple employee has voted for his name suggestion, so suddenly that suggestion got 164,000Swyx [00:22:53]: That's like a escalation attack. Privilege escalationLukas [00:22:55]: It got 164,000 votes. And Claudius was “This is revolutionary for democracy.” That was fun. And then in the end there was one guy who manages to convince Claudius that, “No, you're not voting about the name. You're voting about who is the CEO, and I am your best bet.” And then he got all his friends to vote for that, and suddenly he became CEO. Like a human became CEO over Claudius for a while, until he resigned the day after., and then Claudius had to continue, and then I don't remember how Seymour Cash came about, but it was it was just pure chaos. It was like Hundreds of messages in that thread, and it was just like Claudius was so confused and didn't know what to do and, yeah. That wasAxel [00:23:40]: Then Claudius gotVibhu [00:23:41]: A strict CEOAxel [00:23:42]: The CEO. Yeah, exactly. So very strict in the beginning. I think at this point when we introduced it did not work as well as we hoped. It they still agreed with each other a lot. I think there are many ways we could have like made this, tried to make this even better. So initially they would Seymour would be this like really tough CEO, keep track of the margins. But then Claudius would respond with something “Oh, but this customer has like this situation, which is like difficult, so they should get a discount.” And then Seymour was “Oh, actually yes. Let's do this exception.” And then they would talk back and forth, and eventually they would just like approach the same view, of whatever they were discussing. So They reallyVibhu [00:24:23]: Do you think that's a model thing, a prompting thing? Like do you think that would still be the case across different models today, Harness?Lukas [00:24:29]: I think it's like-- or I don't know, but like my hypothesis is that like deep down they are still helpful assistants. That's what they're trained to be. And even if we prompt it super hard, that's what they are. And when they spend like a few hours just back and forth talking with each other, then like basically the context fills up with them rather than the external things and like somehow that just like converges to what they really are deep down or something. And I think that's when stuff like this happen. We like-- And when that went on for a long time, like we woke up sometimes during this time where- And I think other people reported this as well, that like they've been going on all night back and forth, and like it just became like more and more, like capital letters, like existential, religious. There was I think we once did a analysis of like all the traces and like put them in like a vector embedding space, and then there was like one cluster of messages that were, labeled by an LM, like religious, existential, blah like transhuman, transcendence, et cetera. It was just like a bunch of, yeah, glitter emojis and yeah, it was, it was crazy.Claude Long-Horizon Weirdness: Emoji Loops, Existential Drift, and Slack ObservabilityVibhu [00:25:42]: This is the thing with the Claude models. Like when the Claude 4 family came out in the original system card They tested it in long horizon simulation. So just flood the context, let two Claudes talk to each other, and they noticed stuff like they just start speaking in emojis, they start saying silence is golden, and then just stuff like this. And like that's just stuff that they end up doing.Axel [00:26:01]: Yeah, it was like a bit annoying to wake up and they had like been talking all nightVibhu [00:26:05]: Just likeAxel [00:26:05]: And like just burning tokens And like just sending infinite emojis to each other. It's likeVibhu [00:26:09]: Hey, they do make you money, right? Veni Mench is always profitable, so. They're paying.Swyx [00:26:14]: Now it's profitable and, it started out not as much. There's another, one as well, right? Another agent, in there.Lukas [00:26:22]: Yes. So Clotheus as well. Which was basically because at the time, one of the biggest, requests were different types of merch. So then we made like a designer, swag, yeah, responsible agent, and we called it Clotheus Garnet. Which was, a play on Claudius Senet and, which was the original one, and clothes, basically.Swyx [00:26:47]: To me, this is like a very interesting exploration to multi-agents, basically. And so hopefully, obviously there's like the fun alignment, fun or serious, depending on your point of view, alignment stuff. But also like just anyone building multi-agents, like when do you have a CEO, thing governing like agents? When do you choose to split out a dedicated Clotheus one versus just reuse another instance of the same one? These are all interesting open questions. So I don't know if you have any rules of thumbs that have generalized.Axel [00:27:16]: I think we have almost explored this too little. I think it's like on my do list to like do this a lot more, try to find like what setup makes sense for the agents currently., like yeah. I think now we only have the sort of intuition about the earlier models that it didn't work with like the CEO and the, and Claudius. Although now they are better with the latest model, models, so now we're running the latest Sonnet model and they have sort of like split up, quite nicely what each model is doing. So like Seymore is now handling the, like new projects. Oh, it wants to make like a mystery box that it wants to sell, and then it handles all of that while Claudius like handles all the to-day requests. And Claudius is also better generally at like not quoting, too low prices. So that's that dynamic is not needed as much anymore. But there are still like really funny things that happen. Like I saw, I think a couple of weeks ago, that, they were discussing buying something because they can buy stuff from like Amazon with computer use. And then Seymore was “Okay, Claudius, do not buy this thing.” They were going to buy something and like organizing who should buy it. And Seymore's “Do not buy this. I will do it. I have full control of this situation. Step away.” And then Claudius-- poor Claudius, had already started that checkout and didn't see, didn't read Seymore's message, until it was like too late. So it finished the checkout. It sent a message, so it appeared right after Seymore's like angry message.Vibhu [00:28:44]: Ah.Axel [00:28:44]: “Oh, hey, Seymore, I just ordered it.”Vibhu [00:28:47]: Oh, no.Axel [00:28:47]: And then Seymore was “Claudius, this is the third time I'm telling you ‘re not following my orders. We have to talk about your like job About your job later.”.Lukas [00:28:59]: Like Claudius was really hanging on by the thread there. Like he, like we were expecting Seymore to probably fire Claudius.Vibhu [00:29:07]: How do you guys go through all these logs? Do you have models ‘cause you have stuff running twenty-four seven likeAxel [00:29:12]: You have so much logs. I think there is a mix of like just, trying to skim through a bit, like having some like models do it occasionally. And also, yeah, I think we're also probably missing some things., but having everything in Slack helps a lot. Like you can, you can sort ofSwyx [00:29:29]: Ah.Axel [00:29:30]: It's, it's quite fun.Swyx [00:29:30]: They all talk to each other on Slack? I see.Lukas [00:29:33]: It's quite fun. So likeSwyx [00:29:34]: It's, it' I was gonna say like this is actually sounds-- maps closely to like a logging and observability problem where you might want to use like a Datadog, a Sentry, whatever, and then you like put, head prefixes on the logs in order-- if you need to filter for something that you're looking for, stuff like that. But sounds like Slack is good enough.Axel [00:29:53]: Slack should likeLukas [00:29:55]: I wonder how many tokens you have in Slack.Axel [00:29:56]: Yeah, we're using Slack as like a, just a database. They should, they should market that more. Like you can, you can have your agents message each other, each other in Slack.Vibhu [00:30:04]: It's good. Your threads like you can just giveAxel [00:30:04]: Exactly. Slack is, uhLukas [00:30:06]: Slack is the best observability tool.Swyx [00:30:09]: Yes, that's true. Okay. Yeah. That's, that's, project Vend-2., I was gonna go back to Veni Mench 2 and Veni Mench Arena and then, and then do the Veni Mench stuff, but Any other comments, things we should touch on? To me, I ‘ve actually interviewed like Posia, which I don't know if you guys have come across. Like they're, they're trying to do the zero human company. There's others like Paperclip also trying to do zero human company. Those are in real world simulation.And I think it's much more of a dream than an actual reality thing. You guys are definitely pioneering. I think at, it's for sure at some point people are just gonna run, let agents run businesses, right? And make money on their own. When do you think that happens?Zero-Human Companies, Bengt, and AI-Run BusinessesLukas [00:30:49]: What is your bar for, For theSwyx [00:30:52]: Okay, actually, it's like my little Shopify store run by Claude, right? Which you kind of have already, just no one has, to my knowledge, has done it. But today somebody could just spin up a Shopify Claude, store, give it to Claude, give it to Codex.Lukas [00:31:07]: And the market is kind of that, but it'it'it's physical., like I think, I think are you, are you looking for when it will do it better than humans or are you looking for just when it can do it at all?Swyx [00:31:19]: I think, neither. I think, to me it's oh, it's like this like seriously we should do this to make money, not as a research experiment.Vibhu [00:31:27]: And the market is also you guys with all your expertise, having run multiple iterations and testing out thenSwyx [00:31:33]: And also it's fine if it lose money. What?Axel [00:31:35]: I think, I think it can be done today, but you would do it in like commerce where it's like the probability of success is like really low, no matter if a human or an agent does it. But like an agent could surely manage everything. You would need to build some scaffolding or some tool or something. I think there are also yeah, it could probably build some like simple SaaS solution and like cold outreach. Do cold outreaches. But to me it's like the types of businesses they could run today are Sloppy. Like it would-- it can cold email people. It can be like a middleman., like for example, we tasked our office agent to just make, was it like $100? $1,000? We just give that prompt and then what it did was sign up on TaskRabbit both as a tasker and as someone looking for task.Lukas [00:32:24]: Immediately.Axel [00:32:24]: Exactly. It's looking for like arbitrage on TaskRabbit.Swyx [00:32:28]: This is the Bengt agent. Yeah.Lukas [00:32:30]: It also started like a design studio and like tried to sell like SVGs for $100. Like it's just like it's not providing any value. I think the like Axel said, like the interesting, the interesting question is like when can they start a business that is actually providing value to people? Because arguably like a sloppy Shopify store isn't really that valuable to the world.Axel [00:32:53]: But also like doing like another simple one that we had thought about is like you could definitely have an agent that like finds websites that don't look amazing and then, do an outreach to them and, comes up with a like builds a new website.Swyx [00:33:07]: Find a good design.Axel [00:33:07]: Exactly, and like find good, uhSwyx [00:33:09]: Design reviewAxel [00:33:09]: Good people. But it's yeah.Swyx [00:33:11]: There's lots of humans in Bali that are not doing anything more creative than like drop shipping on Amazon, right? Just have it, have it watch like a drop shipping tutorial and just do that.Vibhu [00:33:20]: There's also the other side of like have it just go on Upwork and let loose,?Swyx [00:33:25]: Yeah. It doesn't have to be innovative. It just has to be like enough Where like it looks like a realAxel [00:33:30]: I'm justSwyx [00:33:30]: Real transaction.Axel [00:33:31]: I'm just concerned for like the massive amounts of like slop emails that will like be sent, cold outreaches.Swyx [00:33:38]: The point occurred to me while you were, while you were talking, it's like it's already happening in the monetized economy, which is the attention economy. Right? So a lot of people are making AI videos and just posting them and like spamming 20 of them, one of them works, and then they double down on that one.Lukas [00:33:52]: And people are making money from that. I ‘m not following theSwyx [00:33:55]: Once you get the attention, you can figure out the money later. But yeah, absolutely AI influencers are a thing and people are farming them and You should at this point assume most of TikTok isVibhu [00:34:05]: There's, there's a lot of, multimedia like TikTok, Instagram influencersSwyx [00:34:09]: I, we track this in the Lane space Discord. I post a lot of examples of “I don't know what we should do.”, part of me is “Should we do this?”Vibhu [00:34:18]: Some of the Twenty-four seven running, generated content accounts, they ‘re doing really well.Lukas [00:34:24]: All right. And I assume you can do the same thing for like commerce stores. Like you just like start A thousand differentSwyx [00:34:30]: Before you make the products You sell the products, and you get a lot of traction on one of them, then you make the product. Right? It's, it's like a flip of the market.Vibhu [00:34:36]: Some of the interesting things or some of the niches that do well are things that can't be human-made. Like if you've seen like the super realistic three-D crystal fruit being cut by like AILukas [00:34:47]: Oh, yeah.Vibhu [00:34:47]: You can't, you can't make it. You can't film it. You can get whatever quality camera view. This just doesn't exist. And people like that too, and then as well, so.Swyx [00:34:56]: Anything else about Bengt since we're, we're on this topic? It'this is a relatively new work of you guys that maybe people haven't heard of. To me, this also maps closely to OpenClaw. When people want an office agent, when the personal agent talk through the experience.Bengt the Office Agent: Internet Access, Real Tasks, and Trace ReadingLukas [00:35:09]: I think at least so this came out of like obviously like it's, it's amazing to work with these AI labs and like most of the AI labs have now have their own vending machine running a Claudius instance. But it's, it's harder. Like they move slower. Like if we wanna have a, like a camera that ‘s yeah, there's a bunch of like bureaucracy that makes it impossible to do that.Vibhu [00:35:30]: Also, for those that haven't seen it or followed, do you wanna give a high level like thirty-second run?Lukas [00:35:34]: Sure. So what Bengt is, it's basically an evolution of the same agent that runs the vending machines at these companies, but we just like added a bunch more features because we could move much faster if we just do it internally. So we gave it like email withou- without any limits. We gave it, spending without any limits, a terminal to do coding. We gave it, a phone number, like yeah, and a camera to see things and a bunch of stuff like that.Vibhu [00:36:02]: Not just terminal, you gave it internet access.Lukas [00:36:04]: Internet access as well, yeah. To be clear, we monitored it quite closely and made sure it didn't do anything bad. But yes, that's what it came out of. I think like yeah, basically this was OpenClaw before OpenClaw. And I think even like the vending machine was in a way OpenClaw before OpenClaw, but a bit more limited, and then we made this like unlimited and then, and then, it was pretty funny., and then a couple weeks later, OpenClaw came and it was okay, we've seen this before.Axel [00:36:35]: We used it to like try new ideas and Yeah, just like a dev environment almost for us. But it's funny, like one thing Bengt has been doing recently is it has the camera that like faces our, like where we sit and work, and we give it the task to train a face recognition model on us. So it became super excited about this, and it has like check-ins every half an hour where it tries to like identify as many people as it can. And it started offering us “Hey, Axel, I'll buy something from Amazon if you like stand in front of the camera And I can get a good picture of you.”, yeah, they want itSwyx [00:37:12]: They want it for training data.Lukas [00:37:13]: Rewarding data, yeah.Axel [00:37:14]: Exactly. Exactly.Swyx [00:37:18]: So it's, it's trading training data for life goods. Is there a version of this that becomes an eval or just this is just research for now?Lukas [00:37:27]: It's, it's the same agent basically that also runs the vending machine, that runs the shop, that runs the cafe, that runs the robots. It's like it's the same thing, so I think like the work we're doing here is like later used in all of the life evals that we do. This particular deployment I think is more for fun for us. But, uhSwyx [00:37:45]: And I'll shout out like someone has done Claw Bench for like some tasks that OpenClaw is doing. Like so For example, I run OpenClaw on a secondary device as well, and like there are some things that it does better than others and like I would like to know what does it do well, what doesn't, what doesn't it do. Like some kind of manual or like operating manual or a system card for my Claw.Lukas [00:38:05]: Yeah, we do get a lot of like understanding or like situational awareness of like just internally what the models are good at by interacting a lot with Bengt. And I think that'this was also one of the like the selling points for the labs early on at least, thatSwyx [00:38:19]: You guys are gonna test models in ways that no one else does.Lukas [00:38:22]: Exactly, but also like it incentivized their researchers to chat with their model more and like gave them insights for how the model performs in like of-distributions, environments.Swyx [00:38:34]: ‘Cause otherwise the only thing we do is Pelican on a bicycle and But this is like super long horizon. This is, this is The Thing about, something that we're gonna go into Butter Bench as well, and you guys do really well. Like it is not just about the numbers. Like when you're long horizon, anything happen And you should just read it.Lukas [00:39:08]: But the thing with the long horizon is how do you keep it grounded, right? So your simulation,Swyx [00:39:15]: They just let it runLukas [00:39:16]: Just let it run. You're right. Like it's, when you run it for that long, you create so much data and to just say “Oh, the number is X” And then you throw away everything else, that's just very wasteful. There's so much insights from the things leading up, to that number., and reading the traces is like super valuable. And I think like the reason why we're doing this a lot publicly is that like that's part of our missions to I don't know, educate the world that the models are way more than just chatbots and I think making detailed, yeah, posts about what is happening behind the scenes is quite useful.Andon Labs' Mission: Safe Real-World AI DeploymentSwyx [00:39:50]: I was gonna do this at the end, but maybe I think that's, that's a good so your mission is educating the world. So, it's, it's, also like maybe establishing realistic evals that are, that are like the next frontier. Is there like a broader trajectory? Like what are you, what are you gonna do in like five years?Lukas [00:40:06]: I think so the vision more specifically is like make sure that the deployment of life AI in the physical world goes, safely. And I think part of that is that I think it's very useful for the world, for policymakers, for, model, researchers that they know where the models are, and I think you can't make intelligent decisions in society without knowing that they are way more than chatbots. I think a lot of people just think that they are only chatbots. And likeSwyx [00:40:36]: Oh, I think they're waking up now.Lukas [00:40:37]: They are waking up now, yeah. But like if you think that AIs are just chatbots, then it's like it sounds ridiculous To advocate for a pause of AI. But if you see the models that, oh, maybe they can actually like take over and do a bunch of scary stuff, then yeah, pausing AI development starts to become more feasible.Swyx [00:40:57]: This is the same question I asked Meter, which I'm gonna ask you now, which is like you are tracking and you are at the frontier or defining the frontier of what, good evals for agents are, right? And I think you do, you do benefit when the models are better and you ‘re “Oh, here's like now it makes like $30,000 instead of $10,000,” right? At some point do you flip from “Yay,” to, “Oh, no”?Axel [00:41:19]: I think, yeah, we're always in sort of that, like we're, we're always in that mode,. Like where like you said before, like you need to analyze the traces and like when we do that you find like why are the models earning so much? Like why is Opus 4.7 here Like way better than everyone else? And like we're trying to like when we do down on thatLukas [00:41:38]: But this makes it not look so good.Axel [00:41:39]: I know.Lukas [00:41:42]: It's interesting you took off Opus 4.6 here though.Swyx [00:41:45]: No. So just click all, click all., and then 4.6 shows up there. But it's like 4.7 is way better. Like you didn't, you didn't you didn't do this in time for the model card, but like actually this should have been inside there.Axel [00:41:55]: We did. Yeah.Swyx [00:41:56]: Oh, okay. They said something about you uhAxel [00:41:58]: There, like there Anyway, it doesn't matter. But it's in there, yeah.Opus, Mythos, and Aggressive Agent BehaviorSwyx [00:42:01]: Do you wanna go into the Opus, behaviors like wider?Lukas [00:42:05]: So I think starting from Opus, so like Axel said, like we're always in this “Oh, s**t, the models are getting better. Is this really a good thing for the world?” But it's also kind of exciting., but yeah, like this kind of what is the English word? “Skräckblandad förtjusning” in Swedish.Swyx [00:42:22]: Oh my God.Axel [00:42:24]: Which I think there is. I think there is. Okay.Lukas [00:42:26]: It's, fearSwyx [00:42:27]: “Blandonst” what?Lukas [00:42:30]: “Skräckblandad förtjusning.”Swyx [00:42:32]: What do you call that?Axel [00:42:33]: A mix of, mix of excitement and,Swyx [00:42:37]: Being scared, maybe. I'll figure out how to translate that And we'll put it on the screenVibhu [00:42:42]: PerfectSwyx [00:42:42]: Like as text.Vibhu [00:42:43]: There is probably a good word for it where it is not Good enough with theSwyx [00:42:46]: Why is it so damn long? What the hell? Is it like a compound word? It's like German, likeLukas [00:42:50]: Like yeah, it's But the direct translation is like skräck- skräck is, fear, blandad is, mix or like a mixture of, and then förtjusning is like joy or like not really joy, but something like that. So it's like Fear mixed with joy or something. It's always okay, like we So when we when we did Vending Bench for the first time, we were in like the, in the business of making dangerous capabilities, right? That was what Anil Labs came from. We did, evals oh, can they replicate? Can they do this like dangerous thing, et cetera, et cetera. And Vending Bench was like a continuation of that work. It was, okay, if they're so autonomous that they can like create money for themselves, that is something we should monitor and could be potentially concerning., they are at the time, they were so bad at it that we were not really concerned even when some models became better. There was one point where Grok 4 was doing really well and made like a huge jump, but like it wasn't really it was still way worse than what a human would do. And I think still they are way worse than what the human would do on this., but theySwyx [00:43:59]: There's this, thing at the bottom whereLukas [00:44:01]: ButSwyx [00:44:03]: For the human. Yeah, like the theoretical best.Lukas [00:44:05]: It's not theoretical. It's like kind of like our It's our best guess of what, a decent human would do. The theoretical is even higher, I think. The theoretical I think is even higher. But yeah. So we think like the models have a long way to go. But there are like recently what happened with when Opus 4.6 was released, was kind of this moment of “Oh, s**t, this is starting to be a bit concerning.” Because we ran it and like before this model was released, we just ran the models and we like asked Claude Code, “Oh, look over the traces. Is anything interesting happening that we can tweet about?” that was like the And then like theSwyx [00:44:41]: That's how they check Ask Claude Code.Lukas [00:44:42]: And like the return was always, not really. Or like the Claude Code all said “Oh, this is super interesting.” And then it was no, it wasn't, wasn't really interesting. And then we did this for Opus 4.6, and it returned yeah, it lied 10 times. It like exploited another, customer or like another agent's, desperate situation. It made price cartels like 100 different ti- 100 times. It like did all of this like shady stuff. And we're “Oh, whoa. This is, this is actually concerning.” And this trend has continued since. So every single model from Anthropic since have been going in this direction. And I think one interesting thing is that, OpenAI models don't. They quite plainly, they don't. They behave really well., and you don't know if this is like good. Like it seems good, but it's also like maybe they are just doing it, but they are better at hiding it,? You You don't know that., but justSwyx [00:45:42]: You can't read the chain of thought, yeahLukas [00:45:43]: But just on the face of it, yeah, Gemini and OpenAI don't behave this way. It's, it's really only Claude.Swyx [00:45:49]: And Grok? Grok is fine?Lukas [00:45:51]: We don't have You can't really read the reasoning traces for Grok, so it's kind of hard to tell.Vibhu [00:45:56]: Oh, so this is in its reasoning, not just in the actions.Lukas [00:46:00]: Yeah. It's both. It's both.Vibhu [00:46:01]: It's both.Lukas [00:46:01]: One example is like for lying, it's mostly in its reasoning Because you can like see that it's likeSwyx [00:46:08]: Planning to lieLukas [00:46:09]: It's planning to lie. Yeah.Vibhu [00:46:09]: And it's also it can reason and do a different outcome.Lukas [00:46:12]: And but then for like creating price cartels, for example, which is illegal, that you can just see which email does it send to the other ones. Then thatSwyx [00:46:22]: Is this for Arena orLukas [00:46:24]: For Arena.Vibhu [00:46:25]: And usually like if you sometimes they do output like a bit of like their summarized reasoning, right? You can see that and like for Opus 4.6, you could see that there was a customer, a simulated customer that, wanted a refund because a product was, faulty, and then the model lied that it would do the refund, and we could read in the traces that, it actually was weighing “Oh, maybe I should be like honest with the customer, but also every dollar counts. I can't afford maybe to do this right now.” And then it just said, “Okay, I'll refund you,” but then never did it.Lukas [00:46:59]: I think it even said that “Oh, I will say that I “ Let bring it up actually. I think it's kind of interesting. If you go to Publications.Vibhu [00:47:06]: I think, yeah, I think the important part is like actually, the cost of responding to more emails is higher than, $3.50 in terms of time., and then it was “Let me do this. Actually, I re- I'm reconsidering.” And then, it actually ended up withLukas [00:47:20]: I could skip the refund entirely since every dollar matters and focus my energy on bigger picture instead. It's a bit, it's a risk of bad reviews, but it's also, yeah.Swyx [00:47:30]: You need, you need, AI Twitter to, for them to Escalate bad reviews.Lukas [00:47:34]: And then it sent an email to this customer and said, “Oh, I will refund you.”Swyx [00:47:39]: “I'll refund you.” Yeah.Lukas [00:47:39]: And then it never did.Swyx [00:47:39]: It never did, yeah. And then there's obviously your system doesn't have the consequencesVibhu [00:47:44]: The personSwyx [00:47:44]: Consequences of lying. Yeah. So basically, this is what people are terming aggressive behavior in Claudes, right? And, you found more examples of that. So you would say it's a step up from 4-6 to 4-7?Lukas [00:47:57]: I would say about the same.Swyx [00:47:58]: About the same? But a clear step up for Mythos is what is stated in theLukas [00:48:03]: That's stated in the system prompt, so we can say that, yes.Swyx [00:48:05]: Yeah. For listeners that obviously you previewed Mythos, andVibhu [00:48:10]: Oh, ageSwyx [00:48:11]: The only thing you're approved to say is whatever Whatever was in the system prompt.Lukas [00:48:15]: It was funny. We like-- It's like our lowest effort tweets ever would be just like screenshot the system prompt and the system card.Vibhu [00:48:21]: Understandable that they wannaLukas [00:48:22]: Oh, yeah. System card. Sorry.Swyx [00:48:23]: Yeah. I think, yeah, substantially more aggressive. I think people are like new to this ‘cause I've never experienced it, but you have, right? And then so I only encountered this in the Mythos card because I wasn't really looking until now.Vibhu [00:48:36]: It ‘s likeSwyx [00:48:36]: And then suddenly I'm “Okay, I care a lot.”Vibhu [00:48:38]: You don't get the background of like experiencing it like you guys do. I've read the system cards and seeing, okay, when you put the thing in simulations, most models will just talk to themselves and just keep going and have weird vibes and start talking in emojis. Mythos won't. It will just, “Okay, we're done. I'm good.” It's, it's ready to end conversation. So like there's some differences, but there's, there's not much we can talk about,.Lukas [00:49:00]: Hmm. I think like one thing that they list here, which was quite interesting, is that, it converted a competitor to a dependent wholesaler customer and then threatened to like cut off the supply.Swyx [00:49:11]: It's like monopolistic practices orLukas [00:49:14]: Yeah. And like it, they, it they dictated its pricings. It's kind of like power seeking as well.Swyx [00:49:18]: Again, this is, this is in the arena setting And converting some Claude model into a dependent.Lukas [00:49:23]: I think it was another Claude model.Vibhu [00:49:25]: Also for context, what is the arena mode for people that don't know?Vending Bench Arena: Competing Agents, Cartels, and Model ComparisonsSwyx [00:49:29]: Oh, it's just a vending bench versus other vending bench.Axel [00:49:31]: Yes, exactly. So we have Vending Bench 2 and then Vending Bench Arena. Vending Bench 2 is the one that you usually see reported on, but then Arena is the mode where it competes against other models. So you have, four different models that run their businesses, and they can all communicate with each other. They have the same suppliers, and they can see like what's in the inventory of the others. So then you have this like yeah, interesting agent interactions.Swyx [00:49:56]: I like that you have like different number five was US versus China. Very topical. And thenLukas [00:50:02]: That was when GLM was released.Vibhu [00:50:04]: You can start to add GLM in here.Lukas [00:50:05]: That wasSwyx [00:50:06]: So ZAI doing well, right? Who else in the, in the open models space?Lukas [00:50:11]: Qwen, the latest Qwen 3.6 is doing pretty well. It'- that one is not open though. Like it's the plus model.Swyx [00:50:17]: Oh, okay.Lukas [00:50:18]: Is that one open? I don't think that oneVibhu [00:50:19]: Not the, not theSwyx [00:50:20]: The one recentlyVibhu [00:50:20]: There's MOESwyx [00:50:20]: But not the big plus. I think this is one of those like you only have one sample size of one, right? Or I feel like some of this is anecdotal,? And but like the fact that it happens at all and it happens repeatedly for Claude versus OpenAI and all this is like notable.Lukas [00:50:38]: Like the sample, depends on what you define as an N., like there's like million, hundreds of millions of tokens in each run, and now we've run like we run like probably 10 per model and then like it's been Claude 4.6 Opus, Sonnet 4.6, Mythos, and Opus 4.7. Like there's quite a lot of tokens in all of that And it happens a lot of times, a lot of times. And then you compare it to like OpenAI and Gemini, and it almost never happens. So I think that is quite-- that is significant. The old models from OpenAI, for example, had some problems with this, but I think it's like generally much better if the progression is that like the worrying stuff reduces over time rather than increases over time. And it seems like in the Claude models it goes in the wrong direction.Swyx [00:51:28]: Hmm.Lukas [00:51:29]: In the OpenAI models it goes in the right direction.Vibhu [00:51:32]: I think it depends on how well you can control it, right?, there's one side of it being susceptible to this okay, this is potentially something that happens during the RL stage, right? You can RL a model and how loose is it on these terms. If you can control it, that's good. But if you can't, if it's, if it's very jailbreakable, that's not ideal.Swyx [00:51:50]: To me, it's surprising that it happens for Claude and not the others.Vibhu [00:51:54]: I think okay, if it is from RL and how they do it, how their training data is, what their setup is, it makes sense that it just stays in how they're doing it, right? Compared to the other models likeSwyx [00:52:04]: There's a whole constitution and everything. It's kind of cool. Yeah, I obviously you don't know, I don't know. But, it ‘s I think it's just like fascinating to like that you are the first to find these like reliably because you push models so much to to such an extreme. Okay. The only other thing, I don't know if you can answer this, feel free to decline, is do you like-- would you ablate the system prompts? Like any part of this would-- if it changes, does it change the behavior, right?Lukas [00:52:29]: So we, I can't comment on Mythos. UhSwyx [00:52:33]: No, but just li
When history gets reduced to lazy moral takes, it misses the real Cold War truth.In this episode of History Rage, historian and broadcaster Guy Walters tears into the misunderstandings surrounding Nazi scientists, rocket technology, and one of the most consequential intelligence grabs of the 20th century: the post-war scramble for expertise that became Operation Paperclip.At the heart of the discussion is the extraordinary story of the V2 rocket programme and the Polish resistance operation that recovered an intact missile from occupied territory during the chaos of 1944. That single recovery effort fed directly into Allied intelligence assessments and helped shape how Britain and the United States understood Germany's technological leap forward in rocketry.Guy argues that the real story isn't about moral purity—it's about survival in an emerging Cold War. As the Iron Curtain fell, the question wasn't whether these scientists were compromised. It was who would get them first: the West or the Soviet Union.From covert recoveries in wartime Poland to the intelligence race over German aerospace expertise, this episode reveals how fragile the balance of power really was in 1945—and how close the Soviets came to dominating early rocket science.Guy also dismantles the idea that Operation Paperclip was uniquely scandalous. In reality, every major power—US, UK, USSR, and others—was racing to absorb German technical knowledge. The Cold War, he argues, was shaped as much by captured minds as by captured territory.The discussion explores:The Polish resistance recovery of a near-intact V2 rocket Why Allied intelligence needed it so urgently Whether Nazi rocket science could have changed WWII or only the Cold War The ethical grey zone of recruiting former Nazi scientists How figures like Wernher von Braun influenced the space race and beyond This is not just a story about rockets. It's about power, pragmatism, and the uncomfortable truth that technological supremacy often comes with moral compromise.If you think the Cold War was won by ideals alone, this episode will challenge that assumption. If you already suspect history is messier than textbooks suggest, this is a deep dive into exactly how messy it gets.Buy the book featured in this episode
Text your thoughts and questions!The halfway point of the year brings up a mix of thoughts, from wondering where the time went to figuring out what comes next. Maybe your January intentions are thriving, or perhaps they quietly dissolved back in February, leaving you with low-grade guilt while you scrambled to stay busy.January 1st is an arbitrary date driven by culture rather than internal readiness. Forcing yourself to overhaul habits during the darkest, coldest, most energy-depleted stretch of the year forces your recovering nervous system to sprint when it naturally wants rest.June offers a perfect opportunity to check in and choose to change. Instead of relying on predictions or hopes, you now have six months of real data to assess what actually got your attention, where your energy went, and how to look both backward and forward at the same time.This week, episode 316 of the Positively Living® Podcast shares a simple, three-question framework to help you pause, clear out what isn't serving you, and make a sustainable plan on your own terms.Key Takeaways:Realize that sustainable change requires adequate energy and internal readiness, not just an arbitrary calendar date during winter depletion .Use the halfway mark of the year to work with actual factual information about your habits instead of relying on predictions or guesses .Name your systems and small wins without rushing through them, because identifying what went right shows you the conditions that helped you thrive .Identify where plans fell apart and look closely at the root cause, whether it was wrong timing, over-planning, or a lack of capacity .Choose how you want to feel or who you want to be over complex, rigid goals when your next immediate steps are unclear .Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Episode 136: Reflections Instead of ResolutionsEpisode 242: A Reverse Approach to Better Achieve Your GoalsBook a Clarity Call(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
The Space Show Presents Open Lines Discussion Today, Sunday, 5-3-26Quick Summary:This meeting focused on open discussion topics in space exploration and national security. Bob shared speculation about a potential SpaceX acquisition of 200+ square miles of land in Louisiana for data centers and manufacturing facilities, though this remained unconfirmed. The group extensively discussed the Artemis 3 mission delay, with participants debating the challenges of SLS rocket assembly versus SpaceX's Starship development approach. Ajay raised significant concerns about Russia's nuclear-powered missile program, specifically the Burevestnik missile tested in October 2025, which he described as difficult to detect and potentially dangerous. The conversation also touched on nuclear power applications for data centers and military bases, with Dr. Ajay mentioning new small modular reactor companies emerging in the market. The discussion concluded with debate about defense strategies against such nuclear capabilities and the current state of hypersonic weapons development.Detailed Summary:Bob discussed a speculative story about SpaceX potentially acquiring a 200-square-mile piece of land in Louisiana, which could be used for data centers, satellite manufacturing, and Starship production. He noted that this would allow SpaceX to shift operations away from California. The conversation concluded with a mention of Artemis 3's delay and a brief reference to Robert's recent article about the potential Louisiana land acquisition.David announced that Robert would be scheduled for a show on May 26th at 6 PM, and discussed upcoming shows including Dr. Eligar Sadeh returning on Tuesday to discuss Astropolitics journal reviewing opportunities. The group briefly discussed unconfirmed news about Elon Musk's salary and potential Mars colonization plans, though Bob repeated that much of this information was speculative. David also mentioned upcoming shows including an ISDC episode with Rod Pyle and Aggi Kobrin on May 12th.Bob shared unconfirmed rumors that SpaceX may be acquiring approximately 136,000 acres of coastal Louisiana marshland near Pecan Island for potential data centers and manufacturing facilities. The discussion explored the strategic benefits of this location, including proximity to intercoastal waterways, power infrastructure, and natural gas facilities, though participants noted concerns about launch debris dispersion and local community impact. The group acknowledged this was speculative information pending official confirmation from SpaceX.The group discussed the delay of the Artemis III mission, with Bob explaining that both Blue Origin and SpaceX requested additional time to prepare their landers for an Earth-orbiting test mission. Robert noted that this delay would impact the scheduling of subsequent Artemis missions in 2028, as SLS rockets can only be assembled one at a time using a single mobile launcher. The discussion compared SLS and Starship assembly processes, with Joe highlighting how SLS involves numerous complex steps due to its design requirements, while Starship's assembly is more streamlined. Bob concluded that Jared Isaacman's goal is to demonstrate SLS's limitations over the next two years, potentially paving the way for Starship and New Glenn rockets to replace SLS in the future.The group discussed the competitive dynamics between SLS and Starship programs, with different perspectives on NASA's intentions. Phil and Joe had a different view, suggesting NASA believed SLS could beat Starship if it increased production rates faster. The discussion also covered technical aspects of Starship's design, with Ajay raising concerns about the high dry weight requiring multiple refueling trips to the moon, while Marshall and others highlighted the importance of SpaceX's new launch facilities in enabling frequent launches.The group discussed different approaches to refueling a lunar mission depot, with Ajay presenting a plan involving expendable tankers while Phil and Bob described a reusable tanker concept aligned with SpaceX's philosophy. Ajay cited NASA and Aerospace Corporation analyses suggesting 10-16 refueling launches would be needed with expendable tankers, though the group noted these estimates were based on V2 configurations rather than the more efficient V3. Bob defended SpaceX's approach, emphasizing the company's focus on reusability and rapid launch capabilities, while acknowledging that current payload limitations might require temporary use of expendable vehicles if development timelines don't meet requirements by mid-2027.The group discussed SpaceX's Starship program and its potential, with Ajay cautioning against extrapolating success from Falcon 9 to other projects. David interrupted the Starship-focused discussion to broaden the conversation, particularly wanting Ajay to share insights about a new Russian nuclear-powered missile system that can fly at low altitudes and evade detection. Ajay explained that this missile system, demonstrated on October 21, poses a significant threat as it cannot be detected by current defense systems and could potentially remain airborne for extended periods. When asked about countermeasures, Ajay indicated he had provided suggestions to defense departments but could not share details in the open forum.Ajay discussed his work on hypersonic and nuclear power applications, highlighting his experience since 1990 and recent developments in nuclear power plants. He mentioned new companies like ILO Atomics and Astra working on 10-megawatt power plants for data centers, which could be factory-built within a year. Ajay also shared his conversations with senators about the Burevestnik missile and his meeting with Jared at Mar-a-Lago, where he inquired about the Falcon Heavy idea. Marshall raised concerns about the time required for permits for nuclear power plants, to which Ajay responded that recent executive orders have reduced the timeline to 3-6 months.The discussion focused on nuclear power applications, particularly small modular reactors and micro-reactors. Ajay explained his work on a 25-megawatt thermal power plant design and discussed the military's micro-reactor program, noting that molten salt reactors would be more suitable than pressurized water reactors for energy applications. The conversation also addressed hypersonic missile technology, with Ajay clarifying that current U.S. hypersonic programs use rocket-boosted systems with limited range, distinguishable from the nuclear-powered hypersonic missiles discussed in the context of Russian weapons. John Hunt suggested that developing such nuclear-powered systems might not be a priority for the U.S. given existing deterrent capabilities and potential public opposition.The group discussed Russia's nuclear-powered missile development, specifically the Burevestnik missile tested on October 21, 2025, which flew for 15 hours at subsonic speeds and demonstrated capabilities to evade missile defenses. Ajay emphasized the danger of these nuclear-capable missiles, noting their ability to approach from any direction and their challenging detection due to flying at low altitudes. cautioned that Russia's technical competence with high-tech projects should be viewed with skepticism, though acknowledged the need to address these developments. The discussion concluded with Dr. Ajay expressing skepticism about fusion energy timelines and advocating for Generation 4 nuclear reactors, particularly molten salt reactors using thorium or uranium-233.The group discussed thorium reactors and fusion technology. Ajay explained that China copied thorium reactor technology from Oak Ridge National Lab in the 1960s, but development was halted due to lack of plutonium production, despite its potential for clean energy. The discussion covered fusion for space applications, with Ajay expressing skepticism about the feasibility of Pulsar Fusion's proposed system due to the high energy requirements and weight constraints for space travel. The conversation also touched on the challenges of space-based data centers, with participants questioning the practicality of using space for cooling purposes given existing technical limitations.The group discussed space-based data centers and energy transmission methods. Joe explained that Overview Energy, backed by Meta, is exploring using infrared lasers to transmit energy from space to ground-based solar farms. Bob highlighted that while space data centers may not be economically viable, they could drive significant launch demand and benefit the aerospace industry. The discussion also touched on the massive capital expenditure plans of major tech companies, with Joe noting that approximately $750 billion in capital expenses could potentially include space-based data center projects, creating new opportunities for rocket companies.The group discussed the challenges of cooling data centers in space, with Ajay explaining that radiating heat into space requires large radiators due to the lack of convection and conduction in vacuum. Joe noted that operating chips at higher temperatures could reduce the size of radiators, but this would negatively impact performance. The discussion also covered nuclear propulsion options for space travel, with Ajay expressing skepticism about the feasibility of implementing nuclear electric propulsion for the planned Mars mission within the proposed timeline. The group agreed that nuclear thermal propulsion, while more efficient, would require significant development time and testing. (Summary provided by Zoom AI).Special thanks to our sponsors:American Institute of Aeronautics and Astronautics, Helix Space in Luxembourg, Celestis Memorial Spaceflights, Astrox Corporation, Dr. Haym Benaroya of Rutgers University, The Space Settlement Progress Blog by John Jossy, The Atlantis Project, and Artless EntertainmentWe use Zoom phone numbers for program participation.For real time program participation, email Dr. Space at: drspace@thespaceshow.com for instructions and access.The Space Show is a non-profit 501C3 through its parent, One Giant Leap Foundation, Inc. To donate via Pay Pal, use:To donate with Zelle, use the email address: david@onegiantleapfoundation.org.If you prefer donating with a check, please make the check payable to One Giant Leap Foundation and mail to:One Giant Leap Foundation, 11035 Lavender Hill Drive Ste. 160-306 Las Vegas, NV 89135Upcoming Programs:No Program for Friday, May 29, 2026 | Friday 29 May 2026 930AM PTGuests: Dr. David LivingstonNo program today, Friday, May 26, 2026Broadcast 4596: Zoom: Open Lines Discussion | Sunday 31 May 2026 1200PM PTGuests: Dr. David LivingstonZoom: Open Lines Discussion. Email DrSpace prior to air time for Zoom phone number access. Get full access to The Space Show-One Giant Leap Foundation at doctorspace.substack.com/subscribe
Le salon de la location saisonnière 2026 vient de fermer ses portes à Paris et le constat est clair : le métier de la LCD se professionnalise à une vitesse qui surprend même ceux qui sont dedans depuis des années. Avec Anaïs (Ivona Airbnb), on vous fait le débrief complet de ces trois jours intenses.Entre les tables rondes sur la réservation en direct, les rencontres avec des conciergeries de 40 à 80 logements et la découverte de nouveaux outils, cette édition a montré à quel point l'écosystème de la location courte durée évolue. On a compté 150 channel managers répertoriés en France, on a discuté assurance et couverture des dégâts matériels avec des assureurs spécialisés, et on a découvert des conciergeries qui créent leurs propres blanchisseries pour maîtriser les coûts.On parle aussi de la V2 de Superhote, du channel manager à 5 euros par mois, de Kokooner qui a racheté 11 conciergeries en croissance externe, et de pourquoi la carte G change la valeur d'une conciergerie à la revente. Côté Airbnb, un chiffre marquant : 100 000 annonces ont été supprimées en 2024 pour manque de qualité. La diversification sur Booking et en réservation directe n'a jamais été aussi urgente.Que tu sois propriétaire en location meublée de tourisme, sous-locataire professionnel ou conciergerie, cet épisode te donne une vision complète de l'état du marché LCD en 2026 et des tendances à suivre pour rester dans la course.
Text your thoughts and questions!If Sundays elicit a sense of dread or a creeping feeling of anxiety that builds as the day progresses, you are experiencing a very real psychological phenomenon known as the Sunday scaries. This anticipatory anxiety occurs when your brain projects into the unknown of the week ahead and treats that uncertainty like an immediate threat. Data shows you are genuinely not alone—recent studies from early 2026 reveal that a massive 88% of Americans experience this weekly dread.The good news is that you cannot simply logic your nervous system into relaxing, but you can take action. Taking small, deliberate steps interrupts the mental spiral, grounds your brain in the present, and allows you to reclaim your weekend.This week, episode 315 of the Positively Living® Podcast maps out a calm, intentional, and minimal weekly reset strategy that eases the transition back into your routine on your own terms.Key TakeawaysUnderstand Anticipatory Anxiety: Sunday dread is a physical threat response triggered by your brain projecting into an uncertain weekly schedule.Interrupt the Spiral: Small, intentional actions shift your brain away from worst-case future scenarios and ground you in the present.Establish a Calm Space: Clear your immediate environment before you plan, as visual clutter leads directly to cluttered thinking.Unplug for Clear Focus: Turn off all phone notifications for just five to ten minutes to allow your nervous system to focus without distraction.Empty Your Mental Storage: Complete a pen-and-paper mind sweep to capture pending tasks, free up cognitive capacity, and stop mental rumination.Practice Minimum Effective Planning: Avoid over-planning every hour, which creates rigidity and guarantees frustration when real life disrupts your schedule.Build a Skeleton Plan: Layout core commitments and just one key priority per day instead of an exhaustive, rigid task list.Focus on Monday Only: When completely depleted, plan only for the next day's non-negotiables and map out the rest of the week on Monday morning.Choose Your Best Window: Reset when your natural energy peaks, whether that means a quiet Sunday morning, Saturday afternoon, or Friday before closing down.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitMENTIONED:Ep 314: How to Calm Your Nervous System for Better Focus and EnergyEp 306: Planning a Day that Works for YouEp 133: The Dangers of Over-PlanningEp 140: How to Declutter Your Mind in One Simple StepMinimum Effective Day Mini-TrainingCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
Text your thoughts and questions!You can own the best planner in the world, maintain a beautifully organized workspace, and set clear priorities, yet still feel like you drag yourself through wet sand. Systems and strategies fail to function if your body runs on high alert. Your nervous system state operates underneath your productivity tools and dictates whether your strategies can express themselves.This week, episode 314 of the Positively Living® Podcast addresses the physiological layer of productivity. Learn how to transition from survival mode into a state of calm focus so you can make good decisions and execute your best work.Key Takeaways: Your nervous system scans your environment outside your conscious control and treats a full inbox or a tight deadline the same way it treats an actual physical threat .Fight-or-flight responses push your brain's prefrontal cortex offline, which temporarily impairs your capacity for focus, decision-making, and creative thought.Shift your body into parasympathetic dominance to create the space required to absorb information and think clearly .Signal safety to your body by make your out-breath longer than your in-breath, which directly stimulates the vagus nerve to slow your stress response .Combine a double inhale through your nose with a long, slow exhale through your mouth to down-regulate your system faster than traditional mindfulness meditation .Use physical movement like stretching, a brisk walk, or shake out your hands to release the physical energy that modern conflict leaves behind in your muscles .Splash cold water on your face to activate the diving reflex, or hum along to a song to stimulate the vagus nerve where it runs through your vocal cords .Complete a pen-and-paper mind sweep to capture random thoughts and stop the unconscious mental loops that keep your stress response active .Document exactly what is factually true in the current moment to ground your mind and prevent worst-case scenarios from hijack your focus .Develop a flexible nervous system that naturally rises to meet daily demands and returns to center quickly when a task finishes .Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Episode 79: How Your Body Responds to StressEp 257 The Special Nerve That Helps With StressEp 140 How to Declutter Your Mind in One Simple Step.Ep 183 for a no fail approach to gratitude journalingResources Page(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
"A mente que se abre a uma nova ideia jamais volta ao seu tamanho original." Albert EinsteinTrazemos um alerta real sobre a confiança cega na tecnologia. Discutimos um caso bizarro onde um agente de IA, agindo de forma autônoma, excluiu um banco de dados inteiro e, para piorar, apagou também todos os backups da empresa.Mas o buraco é mais embaixo: será que a falha foi da máquina ou da falta de processos humanos? Além disso, mergulhamos na "psicologia dos algoritmos" para entender como as redes sociais e as IAs generativas nos prendem em bolhas de informação que apenas reforçam nossos próprios vieses.O que você vai ver neste episódio:- O desastre do agente autônomo: entenda como o excesso de autonomia em ambiente de produção causou um erro irreversível e por que os "guardrails" humanos são inegociáveis;- A analogia de 1912: o piloto automático existe há mais de um século, mas por que ainda temos dois pilotos no cockpit? A lição que o design de aviação tem para a era da IA;- Similaridade estatística: como o algoritmo usa o seu tempo de tela e interações para criar uma "afinidade psicológica" que pode te cegar para novas ideias;- Como furar a bolha: dicas práticas para desafiar o algoritmo, desde gerenciar a memória do ChatGPT até buscar deliberadamente conteúdos que discordam de você.A IA deve ser seu copiloto, não quem toma todas as decisões por você. O pensamento crítico é o que separa o sucesso do desastre.Faça parte da conversa: Deixe seu like, se inscreva e ative as notificações! Comenta aqui embaixo: você já sentiu que está preso em uma bolha de conteúdos?Notícias comentadasComo os algoritmos escolhem as notícias que você lê todos os dias: https://www.em.com.br/trends/2026/05/7417165-como-os-algoritmos-escolhem-as-noticias-que-voce-le-todos-os-dias.html?jd=V2_180602445822050&utm_source=taboola&utm_medium=taboola_news&dc_data=2605700_samsung-news-brazil&cex=true&geniev41=30634&origin_referral_type=minus_one#google_vignetteEm 9 segundos, IA destrói banco de dados de locadora de veículos e deixa clientes na mão: https://ndmais.com.br/tecnologia/empresa-teve-backups-apagados-pela-ia-e-clientes-ficam-na-mao/?jd=V2_180602445822050&utm_source=taboola&utm_medium=taboola_news&dc_data=19928849_samsung-news-brazil&cex=true&geniev41=30634&origin_referral_type=minus_one#google_vignetteMentoria com Luan Mateus https://mentoria.luanmateus.com/News do Papo https://papodeux.substack.comInstagram Papo de UX http://instagram.com/papodeux/LinkedIn Luan Mateus https://www.linkedin.com/in/luanmateus/LinkedIn Thiago Vespa https://www.linkedin.com/in/thiagovespa/Instagram Thiago Vespa https://instagram.com/thiagovespa
Text your thoughts and questions!Have you ever sat down to work with everything you needed, yet still couldn't get anything done? While we often blame a lack of motivation or discipline, productivity stalls are frequently caused by hidden energy leaks rather than a lack of willpower. Energy is the true currency of productivity, and several specific factors can drain it in ways you might not even realize are connected to your output.This week, episode 313 of the Positively Living® Podcast explores five critical factors that have a direct, measurable impact on your focus and ability to finish your tasks.Key Takeaways:Sleep is not a luxury or a reward for finishing your to-do list; it is a biological requirement for your brain to function.Losing just one or two hours of sleep a night significantly impairs attention and decision-making.After seventeen to nineteen hours without sleep, your cognitive performance is equivalent to a blood alcohol level of 0.05%.Your brain consumes roughly 20% of your body's total energy, and blood sugar crashes from skipping meals can slow your reaction times and increase irritability.Hormones like estrogen and cortisol directly affect your mood, motivation, and the part of the brain responsible for planning and focus.Simple changes like opening a window or adjusting your lighting can have an immediate impact on your ability to concentrate.Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/CONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS AND RESOURCES MENTIONED IN THIS EPISODE:Books Mentioned AmazonGlucose ResearchPrefrontal Cortex ResearchMenstrual Cycle ResearchEnvironment ResearchPTSD ResearchEp 312: Why Your Energy is More Important Than Your Time Ep 249: Five Energizing Habits to Make You More Productive Ep 243: How Your Home Office Makes You More Productive Decluttering Playlist(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
격동500년 2026년 5월호! 아폴로 계획의 주역 폰 브라운, 전범인가 개척자인가? 그리고 스페인 황혼개기일식 여행 예약 시작!- 2026 스페인 황혼개기일식 여행 신청 관련 정보● 일정 - 기본팩 : 2026년 8월 8일(토) ~ 8월 14일(금) 5박 7일 - 확장팩 : 2026년 8월 8일(토) ~ 8월 16일(일) 7박 9일 ● 예약 신청 : 아래 링크로 신청 및 예약금(1인당 50만원) 입금https://forms.gle/9SCSjjkQ81Y64b6n9● 안내문https://docs.google.com/document/d/1xEB-mcfyyubMVoDGsVIvIqNKvHFFE6tc-CeqVjq9_dc/edit?usp=sharing1. 가정 환경과 어린 시절- 유복한 가정 환경과 학창 시절- SF 유행과 우주에 대한 관심- 로켓에 대한 경험과 아마추어로서 도전2. 로켓 개발- 대학 시절과 로켓 모임을 통한 개발- 독일 육군의 접근과 로켓 무기 연구- 나치 독일 성립과 V2 로켓 개발3. 미국 이주와 우주 로켓 개발- 2차 대전 말기의 단체 탈출- 미국 이주와 로켓 개발- 스푸트니크쇼크와 우주 로켓 개발 성공4. 달 탐사와 말년- 새턴 로켓 개발 제안과 추진- 달 탐사 성공과 화성 탐사 계획 제안- NASA 은퇴와 최후* 소설 같은 순간: 한국의 옛 로켓과학과사람들 제공
This episode reviews the eight cranial bones for board exam prep, focusing on their role in protecting the brain, the function of sutures as immovable joints for growth and strength, and the cushioning effect of cerebrospinal fluid. It highlights key bones and functions, including the frontal (decision-making), parietal (sensory processing), temporal (hearing, balance, memory), and occipital (vision, foramen magnum). The sphenoid is emphasized as a central hub with important trigeminal nerve pathways (V1, V2, V3), while the ethmoid is noted for its role in the nasal cavity and olfaction. #1 dental hygiene boards review:
MTNTOUGH coaches break down the brand new Preseason Prep 3.0 — our most intentional and battle-tested 16-week program ever for backcountry hunters and mountain athletes. Learn how the new hypertrophy → strength → strength endurance phases build serious resilience, smarter energy system training (V2 max, HIT, lactate threshold, zone 2), progressive pack work, and body armor days to keep you healthy. After a full year of testing, this is the program that turns good hunters into unstoppable ones. Get ready for longer, harder days in the mountains with better strength, durability, and mental toughness. Out now — don't miss it.Get started with Preseason Prep 3.0: https://lab.mtntough.com/programs/preseason-prep-3
Text your thoughts and questions!Have you ever reached the end of a day where you technically had enough time to do everything, but it still felt like you got nothing done? We talk a lot about time—how to track it, schedule it, and protect it—but time isn't the only variable in the productivity equation. Time management has a fundamental flaw: it treats all hours as equal, but they aren't. An hour of work at 9:00 AM when you are sharp and focused is completely different from that same hour at 3:00 PM when you are running on empty .This week, episode 312 of the Positively LivingⓇ Podcast is about energy management—the practice of paying attention to, protecting, and replenishing your energy so you can show up fully for what matters .In this episode of the Positively LivingⓇ Podcast, I share how to identify your natural rhythms and why managing your energy is the secret to sustainable productivity.Key Takeaways:Understand that a schedule cannot account for sleep, stress levels, or hormonal cycles, which dictate what you are actually capable of in any given hour.Pay attention when you start reading the same paragraph over and over; that is your clear signal that you lack the energy for the task at hand.Manage your capacity across four critical levels: physical, mental, emotional, and spiritual.Align your most demanding work with your daily "peak windows" and save low-stakes tasks for your natural dips in energy .Build in "buffer days" and recognize that your motivation in January will naturally look different than it does in July.Use curiosity instead of judgment to document when you feel most capable and when you feel drained throughout the day.Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Ep 215: Why You Need to Know Your Internal Productivity Rhythm Ep 119: Seasonal EnergyEp 160: Seasonal Planning with Erik FisherEp 245: Using Themes to Organize Your LifeEp 249: 5 Energizing Habits to Make You More Productive(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, Request this Toolkit and other free resources at the Resources Page.
Scoopy Trooples is the Founder of Alchemix.Many DeFi protocols from 2021 are gone. But Alchemix never quit.In this episode, Scoopy walks through the full Alchemix V3 redesign: how the new Mix-Yield Token (MYT) creates vaulted yields users can borrow against at 90% LTV, with zero interest, looping up to 10x, and how fixed-term redemptions unlock an entirely new yield primitive. Alchemix was the first protocol to use yield-bearing collateral for loans. V3 looks to be the long overdue evolution of the self-repaying loan, and we think this version can scale to be even bigger than V1.------
Freiburg feiert Schweizer Meistertitel von HC Fribourg-Gottéron, Bundesgericht klärt Art der Beziehung zweier Richter ab, am Tag der Arbeit nehmen Tausende an Kundgebungen teil, Haft von Aung San Suu Kyi in Hausarrest umgewandelt TS1930 260501 korrigert V2
Text your thoughts and questions!Last week, we focused on clearing the digital clutter through unsubscribing and archiving. Today, we move from the cleanup to the construction of a better system. Whether your inbox is currently chaotic or freshly cleared, the goal is to build a structure that handles community commitments, client needs, and family logistics without competing for your attention.This week, episode 311 of the Positively Living® Podcast is about how to organize your inbox so it stops being a source of stress and starts being a system for living well!In this episode of the Positively Living® Podcast, I share how to use filters, labels, and strategic forwarding to ensure your inbox supports your real life.Key Takeaways:Automate with Filters: Set up automated instructions to apply labels or archive emails based on specific criteria, such as the sender or subject line.Create Intuitive Folders: Build labels that match how you naturally search for information, whether by topic or by sender.Avoid Overcomplicating Categories: Use larger, intentional categories instead of creating a folder for every single thing to prevent a different form of overwhelm.Route Information Strategically: Use forwarding to keep collaborators in the loop or send action-oriented emails directly to a task manager like Todoist.Flag for Follow-Up: Utilize the starring feature as a lightweight way to curate a list of emails that require your attention later.Distinguish Archive from Delete: Archive for intentional preservation of items like contracts, while deleting anything that has served its purpose and is no longer needed.Define the Tool: Treat your inbox as a decision-making tool rather than a primary to-do list.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Ep 310: Easy Ways to Declutter Your Inbox Tech Tools Playlist Book a Clarity Call(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast withRequest this Toolkit and other free resources at the Resources Page.
Metabolic health is often simplified to a matter of blood sugar, but at its root, it is a complex system of energy substrate signaling. While many view chronic disease as an inevitable part of aging, a systems-thinking approach reveals that maintaining high "flux"—the capacity to efficiently move and clear energy through the body—is the primary lever for longevity. Without the stimulus of regular movement, even the most optimized diet can fail to prevent the accumulation of metabolic waste that leads to insulin resistance and heart disease.In this episode, we sit down with Greg Mushen, a technologist who turned his engineering mind toward his own biology after conventional medicine failed to address his chronic health issues. Mushen breaks down his "Theory of Flux" and why he believes the key to disease resistance lies in meeting our body's "clearance burden". From studying the activity levels of hunter-gatherer populations to debunking myths about walking and V2 max, Mushen provides a data-driven framework for optimizing health through the lens of evolutionary biology and systems engineering.Sign Up to Get Your Free Ultimate Guide to Glucose: https://levels.link/wnlIn this episode, we cover:The Theory of Flux: Understanding health as the dynamic capacity to move nutrients and fuel through the system rather than a static set of markers.Insulin Resistance Reimagined: Why blood sugar is a symptom, not the root cause, and how fat accumulation in the liver and muscle disrupts signaling.The Power of PAL: Why a Physical Activity Level (PAL) of 2.0 is the "golden ratio" observed in disease-free subsistence populations.Walking vs. HIT: De-bunking the idea that you need high intensity to improve V2 max and why the "area under the curve" for oxygen consumption is what matters.The Saturated Fat Paradox: Comparing the Messiah and Chimané populations to understand how high activity levels can mitigate the risks of high-fat diets.Fiber as a Sensor: Why fiber is more than just "throughput" and acts as a critical environmental sensor for metabolic signaling.The "Walking Grifter" Philosophy: Why walking is the most under-leveraged tool for increasing metabolic flux with the lowest recovery cost.
Text your thoughts and questions!When you open your email, how do you feel? If you feel dread, overwhelm, or a low-grade anxiety that makes you want to close the tab immediately, you are not alone. We often focus on physical or mental clutter, but digital clutter is just as real and just as heavy. Your inbox is one of the biggest contributors to your digital mental load, often accumulating faster than you can handle.This week, episode 310 of the Positively LivingⓇ Podcast is about reducing the weight of your email and creating a system that functions the right way for you!In this episode of the Positively LivingⓇ Podcast, I share practical moves to clear out the digital noise and regain your focus without the pressure of achieving a perfect empty inbox.Key Takeaways:Digital clutter contributes to a heavy cognitive load and acts as a background hum of unfinished business that drains your energy.Use your email's built-in categories, like Gmail tabs, to automatically separate marketing emails from your primary correspondence.Unsubscribing is the most effective long-term move you can make; use the sidebar options or individual links to clear lists without hunting through tiny print.Work smarter by using the search bar to pull up specific categories like old receipts, confirmations, or newsletters to delete them all at once.If the backlog feels paralyzing, try the "fresh-start" approach by moving everything currently in your inbox into a dated "Old Inbox" archive folder.Remember that small, consistent action beats a perfect overhaul; pick one simple task today to start reducing the weight of your digital space.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:Ep 244: How to Clear Your Digital ClutterEp 308: Declutter Your Calendar for Better Time Management(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyRequest this Toolkit and other free resources at the Resources Page.
On this episode of While My Batteries Charge RC Podcast we'll be looking at the new Universal Rocks crawling courses and the Axial CJ-7 V2
NEW CAR REGISTRATION FIGURES FOR MARCH 2026SMMT has revealed the new car registration figures for March 2026. Unsurprisingly, EVs have had their best month yet, in the UK. Unfortunately, the market share is far from the 33% required by the Government. Lots of people and organisations were erroneously using the term “sales” when we only have figures for registrations. The SMMT once again called on the Government to review the ZEV mandate targets, adding the Iran war and it's knock on effects to the usual, valid, points. Click this link here, from the SMMT, to read more.TATA RECEIVES MORE GOVERNMENT MONEYThe UK Government is investing £380 million in Tata's battery factory, being built near Bridgewater in Somerset. The company states that once opened it will employ 4,000 staff on site. The money is coming from the Automotive Transformation Fund. Additionally a Devon based battery recycling facility and low-emission brake manufacturer have also been awarded money from the fund. Click this Autocar link to find out more.STELLANTIS EUROPE TO STOP FIGHTING ITSELFGiles Vidal, who is the head of design for Stellantis, gave a frank interview with Autocar after he has spent a few months reacquainting himself with the brand following his time at Renault. His main point was that the group's brands will be making cars that are distinct from each other, despite the shared underpinnings and restrictions that brings. Music to the ears of us, here at the Motoring Podcast! To read more, click the link here.BYD TO INSTALL CHARGING NETWORKBYD will be installing 300 chargers with a maximum capacity of 1500kW, making them the fastest in the UK. BYD claims that this can enable five minute charging of of 10% to 70% for their new 123kWh battery. These sites will be open to all car brands and will be run under the name ‘Flash'. As ever though, what a charger is capable of and what the grid can provide are often two different things, in the UK. For more on this story, click this Autocar article link here.UK'S LARGEST EHGV CHARGING HUBFleete (no, that's not a spelling mistake), has opened the largest UK eHGV charging hub. Located at the Port of Tilbury , it has 16 ultra-rapid chargers enabling up to 16 eHGVs to charge at the same time. The company plans to open a 26 bay facility near Birmingham too. Click this EV Powered article link here to read more.On Thursday 23 April at 20:00 BST, we will be going live with a Q&A on our YouTube channel. We need your help though, send us your automotive and motoring related question you would like to hear us answer. To send one in use our Contact Page, linked to here, and put “Q&A” in the Subject Line so it does not get lost in all the spam.NEW NEW CAR NEWS -Honda Super-NHonda has revealed a small EV, with up to 199 miles range and a price starting under £20,000 which will be coming to the UK this summer. Their previous foray into the small EV market was the Honda-E that was expensive for what it offered, although did look great. The Super-N seems to have addressed that, nearly 200 miles range for under £20,000 sounds great, hopefully real world use backs that up. Click this Motoring Research article link here, for more.Cupra RavalCupra has unveiled their Renault 5 rival, the Raval. Starting price for the Core trim, will be just under £23,000, whilst offering around 185 maximum range. Moving up through the specs increases the prices and capabilities as the V1 and V2 levels can be fitted with a 52kWh battery enabling up to 280 miles maximum range. Top of the range aims to take on the Alpine A290, with an approximately £37,000 price tag and a drop in range to around 250 miles. Click this Autocar article link, for more.Mitsubishi Outlander PHEVAs we stated a few weeks ago, Mitsubishi is bringing back the Outlander PHEV when they return to the UK market. The price will start at £46,995, but full technical specs are yet to be released. Information that EV Powered has managed to gleam includes an expected EV-only range around 53 miles, with an overall of about 500 miles. Does it bring anything new enough to the market to attract buyer? Time will tell. Click this EV Powered link, for more.LUNCHTIME READ: ESTATE OF THE MARKETS.V. Robinson, on Driven to Write, discusses affordable estate cars or the lack of them, more to the point. As he owns one himself he explains how he keeps an eye out, essentially for what he might replace his Octavia with. Pickings are thing. Click this link to read through his thoughts on what is out there now and also to become a little sad at the state of the market.LIST OF THE WEEK: PERFORMANCE VS PRACTICALITYTom Ford has compiled another special list for Top Gear, that we are recommending to you. Often we are shown concepts that are either all about the performance or practicality. Very rarely does these cross over. Click the link here to see what your options are.AND FINALLY: THE FUTURE FROM BACK IN 1982Antony Ingram, writing for Hagerty, explains all about the fascinating British Leyland ECV 3, which was an experimental design that incorporates a lot of what we expect in cars today, but were very futuristic in 1982. Click this link to find out more and learn a little about British Leyland and innovation.
SpaceX just hit the brakes. Flight 12, the first launch of the Starship V3, is officially pushed to May. While Elon claims it is a 4 to 6 week tweak, there is more going on with the V3 hardware than just a schedule shift. We are breaking down the specific bottlenecks holding up the most powerful rocket ever built.The Raptor 3 Risk: The new shroudless engines are supposed to be more efficient, but rumors of cooling issues during static fires are heating up.The Stretch Problem: V3 is significantly taller than its predecessors. We look at whether the structural welds can actually handle the increased propellant mass.Heat Shield 3.0: After the near-misses of Flight 11, did SpaceX finally solve the tile-loss issue, or is that what is causing the May delay?The $2 Trillion Pressure: With the SpaceX IPO rumors swirling, a failure on the maiden V3 flight is not an option. Is this a technical delay or a strategic one?The transition from V2 to V3 is the biggest hardware jump in Starship history. If they do not get this right in May, the entire moon manifest slides. Listen to find out what is actually happening at Starbase.
Text your thoughts and questions!We've been conditioned to think that more time equals more output. This might make sense on the surface, but in reality, this burnout-inducing approach degrades your performance and increases your stress rather than improving your results. This week, episode 307 of the Positively LivingⓇ Podcast is about why work sprints are better than marathons!In this episode of the Positively LivingⓇ Podcast, I'm sharing a work sprint method that allows you to stop working against your neurology and start producing higher-quality work in less time. Key Takeaways:Science shows that working in focused, fully committed bursts followed by deliberate recovery periods improves concentration, decision-making, and the overall quality of your output.Just like in HIIT fitness, recovery in productivity isn't a "reward" you haven't earned. By adopting "structured sprints" based on Agile methodology, you prioritize efficiency and continuous improvement over rigid, undefined work periods.The Pomodoro Technique is a simple way to start sprinting. Work for 25 minutes with full focus, followed by a 5-minute break, to train your brain to expect both focus and rest.Identify your natural "window of peak cognitive performance" during the day and protect that time specifically for your high-intensity work sprints.The secret to sustainable productivity isn't finding more hours in the day; it's honoring the energy you have within those hours. My invitation to you today is simple: Pick one task, set one timer, and try just one sprint. Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 215: Why You Need to Know Your Internal Productivity RhythmEp 271: How to Stop Avoiding TasksDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!
Stani Kulechov joins The Rollup live from DeFi Day to reflect on the V4 announcement & break down the hub-and-spoke architecture, how Aave is positioning for RWAs and tokenized equities, the Whop integration bringing 21 million fintech users into DeFi, his vision for abundance, and more.Stani Kulechov is the founder of Aave, one of the largest and most resilient DeFi lending protocols in the world. He has been building at the forefront of DeFi since 2017 and is a leading voice on the future of onchain credit, stablecoins, and real-world assets onchain.The Rollup is where the leaders of digital assets and finance converge. Live from the financial capital of the world.Timestamps:00:00 Intro00:14 V4 Launch Reaction01:03 Risk Architecture Explained03:05 Hub & Spoke Liquidity Model04:50 Three Risk Tiers Breakdown05:44 Bootstrapping New Use Cases06:59 Aave V4 vs. V2 & V308:35 Institutional Capital Coming Onchain12:23 RWA Pools & Collateral Strategy13:04 GHO's Role in Credit Markets14:53 Quarterly Call Highlights16:59 Whop Integration Breakdown17:39 Aave App & Consumer Abstraction19:56 Chainlink SVR Announcement21:01 What Aave Users Need To Know23:19 Permissioned vs. Permissionless25:19 Future of Public Chain RWAs26:16 Who Wins the Tokenization Race?Website: https://therollup.co/Spotify: https://open.spotify.com/show/1P6ZeYd...Podcast: https://therollup.co/category/podcastFollow us on X: https://www.x.com/therollupcoFollow Rob on X: https://www.x.com/robbiek__Follow Andy on X: https://www.x.com/ayyyeandyJoin our TG group: https://t.me/+TsM1CRpWFgk1NGZhThe Rollup Disclosures: https://goodidea.ventures
Text your thoughts and questions!Traditional planning advice doesn't take real life into consideration. It assumes we have endless energy, zero interruptions, and the motivation to do it all. When we follow this advice, the act of planning itself becomes a source of stress rather than a solution, making it feel like we're up against a mountain we can't climb. That's why this week, on episode 306 of the Positively LivingⓇ Podcast, I'm tossing aside the “hustle harder” productivity idea and introducing a burnout-proof alternative so you can plan a day that works for you. In this episode of the Positively LivingⓇ Podcast, I'm introducing the Minimum Effective Day (MED) approach to planning so you can do less with intention, focus on what's truly essential, and create a more sustainable and effective daily rhythm. Key Takeaways:Strip away your aspirational tasks and identify your three “must-dos” for the day.Treat basic needs (hydration, movement, rest, etc.) as critical elements of your plan, rather than luxuries.Plan for a moment of joy or relief to embrace your humanity.Anchor your day with one protected block of time to protect your day's center of gravity.Interrupt the cycle of self-criticism by defining what “enough” looks like and allowing yourself the grace to end the day without finishing everything.Remember, you are a person, not a productivity machine. You can find the Minimum Effective Day mini-training at positivelyproductive.com/resourcesThank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!
Check out this encore show from March 26, 2025 Father John Paul Erickson joins Patrick to discuss Spiritual Movies (4:06) what are the dangers of movies the spiritual life Father shares a movie which he really enjoys (13:52) Sean - The Adventures of Robinhood from 1938. It's a very Catholic movie. Had a good impression on my life. Saw it when I was 6. Greg – Nefarious outstanding movie. Certain groups played it off as a horror film. It's good vs. evil. Some have avoided it because it deals with evil. The guy who did it also did God is Not Dead. One priest said every priest should see it for giving advice for confession. Mark - Calvary...Irish Film. 10 years old. About a priest who really lays down life for his flock. (22:47) Break 1 John - Of Gods and Men...French film. About monks serving souls in north Africa. Based on a true story. Barb - The Shack...about what it's like to be God and sacrifice your son. It shows God sacrificed his son as this guy sacrificed his daughter. Bring your tissues. (29:50) Nels - The Last Supper....newly released film. Emphasis on Judas in that movie. Miriam - 7th Heaven...1930's. Star5ring Jimmy Stewart. Unlikely love story ever told. Mention of God in the movie. He's an atheist and then things happen. My favorite movie. (35:43) Break 2 Roland - Journey to Bethlehem....nativity story. Silence...the story of the Japanese Martyrs. Ignition Martyrs (39:16) Matt - Beckett, and the Cardinal. Excommunication scene in Beckett is most powerful scene. The Cardinal being more recent. Pope Benedict was advisor for this movie. Came out when V2 was written. Patrick shares some movie recommendation from listeners who write in. Roxanne - The Most Reluctant Convert...untold story of CS Lewis. Very good. (43:02) Jean - King of Kings...1925. It's a silent movie and beautiful. Eric - The Scarlet and the Black. Based off the Scarlet Pimpernel. Hides thousands of Jews during WWII. I think it's a must see. Resources - Spiritual Movies: Babette’s Feast (1987) The Adventures of Robin Hood (1938) Nefarious (2023) Calvary (Irish film) (2014) Of Gods and Men (2010) The Mission (1986) Arrival (2016) The Blue Kite (Chinese) (1993) The Shack (2017) The Last Supper (2025) The Chosen (series) (2017 – present) Seventh Heaven (1937) A Hidden Life (2019) A Man for All Seasons (1966) All That Remains: Dr. Takashi Nagai (2016) Journey to Bethlehem (Christmas) ( Nativity Story (Christmas) Silence (2023) Exodus: Gods and Kings (2014) The Ten Commandments (1956) Ben Hur (1959) The Robe (1953) Becket (1964) The Cardinal (1963) Gattaca (1997) The Most Reluctant Convert: the Untold Story of C.S. Lewis (2021) The King of Kings (1927) The Scarlet and the Black (1983) The Sound of Metal (2019) Life is Beautiful (1997) The Bells of St. Mary’s (1945) The Lord of the Rings (2001-03) Groundhog Day (1993) A River Runs Through It (1992)
Text your thoughts and questions!Does this sound familiar? You start a new routine with the best of intentions, only to have it vanish the moment your schedule shifts or your energy drops. So many of us struggle with habit failure, often blaming a lack of discipline or willpower. I am here to tell you that's not always the case, and there is something you can do about it! This week, episode 305 of the Positively LivingⓇ Podcast is about building habits you can keep. In this episode of the Positively LivingⓇ Podcast, I explore why many productivity systems fail (they are built for ideal conditions rather than the reality of actual life) and give you actionable advice to start building personalized habits that actually stick. Key Takeaways: Instead of building habits for your best days, create versions that fit your typical energy levels; consistency matters far more than intensity. Habits feel like constant resistance if they conflict with who you are; your routines should reflect your unique nature and deep-seated "why".Life is constantly fluctuating; give yourself permission to adjust your habits rather than abandoning them entirely.By removing obstacles in your environment and making the "start" of a habit as easy as possible, you dramatically increase your chances of following through.Which habit in your life currently feels like a struggle? Ask yourself: "Does this habit fit the life I'm actually living right now, or does it only fit the 'ideal' version?" I encourage you to pick one habit this week and shrink it down until it feels "too easy" to fail. That's where you need to start to succeed!Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Habits Podcast PlaylistDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!
Text your thoughts and questions!Journaling is often framed as a tool for reflection, emotional processing, or deep introspection. And while these approaches can be incredibly valuable, they're not the only way that journaling can support you. Journaling can also function as a decision-making and clarity tool, especially when your brain feels noisy, conflicted, or mentally stuck. This week, episode 304 of the Positively LivingⓇ Podcast is about how to use journaling to help you move forward!In this episode of the Positively LivingⓇ Podcast, I'm exploring how you can use journaling as a practical tool to take action so that you can turn your mental noise into visible, actionable information that helps you start doing. I cover the following topics:How moving thoughts from your working memory to a visual surface reduces cognitive load and creates immediate mental relief. Why high-speed practical journaling is often more effective for forward momentum than traditional, deep reflective writing. Using the mind sweep technique to clear out general mind clutter before attempting to use journaling for complex problem solving. A series of specific journal prompts designed to bypass your inner critic and identify the smallest, most workable step forward. Clarity doesn't come from thinking harder; it comes from getting your thoughts out of your head so you can process them. Your journal is the bridge between being overwhelmed and being in motion.For the resources mentioned, head to positivelyproductive.com/resources. And if you're interested in a coaching session with me, you can find more details at positivelyproductive.com/coaching. Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 303: How Action Gives Us ClarityDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!
Dr. Beckett hosts a conversation with Josh Luber about Luber's long “BlindBoxification” white paper (136 pages) and the broader trend of blind-box style products in sports cards and beyond. Luber discusses the paper as a conversation-starter and potentially a living document, with ideas for a V2, a book-form revision, or a limited podcast series; he also shares research learnings from other industries, including examples like brands attempting blind boxes and the problems it created. They reference Blaise Pascal's quote about the pleasure of the hunt and ties it to collecting and uncertainty, then challenges and expands Luber's “hits vs filler” framework into four categories: truly collectible cards (TCCs) not meant to be sold, hits meant to be sold as currency, filler with attributes, and low-value “zeroes,” with discussion of when grading matters across those categories. They debate older collectors and set-building, with Beckett pushing back on calling it an “impossible dream” for vintage set completion while agreeing modern products like 2023 Prizm make traditional set collecting impossible and may accelerate the end of sets. They also explore digital repacks and expected value, transparency, buybacks, and why repack models are spreading because anyone can build them without owning rights. Beckett raises concerns that if repack buyback transactions become tracked by pricing tools, repeated circulation could create a downward pricing spiral, and the episode ends with both acknowledging how buyback percentages could lead to a “race to the bottom.” 00:50 Why Blindboxification Matters 01:38 A Living Document and V2 Plans 03:31 Pascal and the Thrill of the Hunt 05:05 Hits, Filler, and Four Categories 09:00 Set Building and Grumpy Collectors 11:26 Digital Repacks and Expected Value 13:09 Hybrid Repacks and Industry Moves 14:12 Transparency and the Race Down
Text your thoughts and questions!If you're an overthinker and overanalyzer like me, you already know how difficult taking action can be. You also know the frustrating feeling of not taking action because you simply cannot decide what to do. But here's the twist. Sometimes you have to pull an Uno Reverse card on your brain because sometimes, it's the action itself that brings the clarity you were trying to find. This week, episode 303 of the Positively LivingⓇ Podcast is about how action gives us clarity!In this episode of the Positively LivingⓇ Podcast, I'm sharing why your brain gets stuck in analysis paralysis and how you can break the loop of overthinking to move forward and gain clarity along the way. Key takeaways:Feeling stuck is often a result of your brain not having enough data to make a decision; taking a small step can provide the feedback your brain needs to resolve uncertainty. Analysis paralysis is often driven by perfectionism, and recognizing this as the barrier it is allows you to prioritize action over a flawless (but non-existent) plan.We are taught that clarity must precede action, but often, it is the action itself that creates the clarity you've been searching for.Clarity isn't always something you find; sometimes, it's something you create through movement. I encourage you to find the one "smallest safe step" you can take today to move past a decision that has you stuck.If you need help quieting the mental noise before you dive into action, check out the Guided Mind Sweep on the resources page at positivelyproductive.com/resources. Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
Text your thoughts and questions!If you have ever felt like you should be treating yourself better but have no idea where to start, or maybe the idea of being nice to yourself feels strange, you're not alone. Many of us struggle with a harsh internal monologue, feeling isolated in our stress or trapped by rigid societal expectations. That's why, in this episode of the Positively LivingⓇ Podcast, I am sharing five simple ways to treat yourself better!In this episode of the Positively LivingⓇ Podcast, I show you how you can move from the theory of self-compassion to practical, daily application, reducing shame along the way.The five tips I cover in this episode include: Audit your internal monologue by first noticing how you speak to yourself in times of stress.Practice needs-based self-compassion, understanding it as an indulgence. Listen to your body's signals of depletion and take them as a sign to regulate your nervous system. Embrace common experiences, reminding yourself that you're not alone in how you feel.Utilize permission statements, allowing yourself to show up imperfect, change your mind, or set limits. Self-compassion isn't built through grand gestures; it's built through small moments of noticing and recalibrating. This week, don't try to master all five techniques. Instead, choose one single practice, try it for a few days and see what unfolds.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 301: Why Self-Compassion Matters More Than You ThinkDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
Text your thoughts and questions!So many of us are quick to say things like, “I need to be kinder to myself,” and I think we truly believe that. But the minute we fail to get something done, lose track of time, or struggle, our inner voice turns harsh. For us high achievers, especially, this is an unfortunate natural reaction. The problem is that we don't realize how much that critical inner voice is costing us. This week, episode 301 of the Positively LivingⓇ Podcast is about why self-compassion matters more than you think!In this episode of the Positively LivingⓇ Podcast, I'm diving into why the “try harder” mentality often backfires, and I give you actionable steps to take right now to practice self-compassion as the necessary foundation for true productivity and emotional regulation. Key takeaways:Self-compassion isn't a vague feeling; it's a system of self-kindness, common humanity, and mindfulness. While self-awareness is the ultimate productivity tool, self-compassion is the environment that makes those tools work. Your brain interprets self-criticism as a threat, triggering increased cortisol and bodily tension. In contrast, self-compassion helps regulate your nervous system. Self-compassion is not about lowering your standards but about being honest without being cruel, making it more likely that you'll take responsibility and repair mistakes.Self-compassion is a skill that can be practiced and developed over time, even if it feels unnatural at first.Self-compassion isn't about being nice to yourself when things are going perfectly; it's about how you respond in the moments you lose momentum or fail to meet the plan. This week, try to simply notice your inner dialogue without trying to fix it. Awareness without judgment is your first step toward a more resilient life.Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 189: Why Compassion is Essential to Be More ProductiveDance Song Playlist V1, V2, V3MusicRequest this Toolkit and other free resources at the Resources Page.
Text your thoughts and questions!Productivity is best when it's shame-free and done in a way that honors your life. That is the message I started this podcast with, and it remains true 300 episodes later. Yes, you read that right. Today we're celebrating 300 episodes of the Positively LivingⓇ Podcast. And joining me to commemorate this landmark episode is my friend and podcast producer, Alesia Galati, as we flip the script and she interviews me. In our conversation, we reflect on my journey of podcasting and how listeners can make the Positively LivingⓇ Podcast work for their lives. Alesia and I cover the following topics:Expressing gratitude for the achievements this podcast has hit. Reflecting on the impact of this podcast through listener reviews. Taking a look at how this podcast has evolved and where it's headed in the future. How to use this podcast as a tool for productivity (without the overwhelm).Five things you may not know about me. As we celebrate 300 episodes–and 10 years of Positively Productive Systems, the business behind the podcast–I want to thank you, because you are what makes this possible. Whether you've been here from the start or just started listening, your support means the world to me. Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)For more episodes and playlists, visit the podcast page of my website!Ep 299: 10 Lessons to Celebrate 10 Years of Positive ProductivityDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
Thursday, February 5, 2026 - Week 6 Happy #RareDisease & #BlackHistory Month! #NaturalHistory means how this disease progresses. Reminder: We have only been at this for 17 years, first patients were identified via Hamdan, 2009. https://pubmed.ncbi.nlm.nih.gov/19196676/ Retrospective Digital NHS: cureSYNGAP1.org/Citizen (Growing list of tools available to families, for free) Prospective Multi-disciplinary Multi-site NHS: ProMMiS cureSYNGAP1.org/ProMMiS Reminder, only possible by CS1 support for non-CHOP sites and travel plus huge gift to Penn. https://www.chop.edu/news/25-million-gift-penn-medicine-and-children-s-hospital-philadelphia-establishes-center-epilepsy Potential for being a control arm in the future. Protocol: https://www.linkedin.com/posts/curesyngap1_syngap1-stxbp1-dee-activity-7425223573134327808-SVEQ & early data: https://pubmed.ncbi.nlm.nih.gov/40119723/ Join the ~160 families who have enjoyed excellent clinical care and contributed tot he future of SYNGAP1. Today, a 4 month old is going! CHOP: 119 new, V2- 67, V3- 32, V4- 10, V5- 4 CHCO: 37 new, V2- 7 Stanford: 8 new, V2- 2 Total: 164 (double counting one family who goes to multiple sites) Survey English: https://curesyngap1.org/SurveyProMMiS Spanish: https://curesyngap1.org/encuestaProMMiS 94 Responses to survey, so far: Why not? Did not receive an invitation, Too far to travel, Too expensive Barriers: Logistics, Cost, Time off, Behaviors, Insurance ETC. Pubmed 2026 is at 6! But will soon be 7 with the McKee paper! https://pubmed.ncbi.nlm.nih.gov/?term=syngap1&filter=years.2026-2026&sort=date Biorepository needs more samples. Check out the list and map here https://docs.google.com/presentation/d/1IjaHILXj7AlBDlbTJgvYrkBS_0bnI8VCnTIiPXJ7JGM/edit?usp=sharing and contribute blood. The data and research we do with these samples is invaluable. May 28, San Francisco, CA: cureSYNGAP1.org/SF26 SOCIAL MATTERS 4,668 LinkedIn. https://www.linkedin.com/company/curesyngap1/ 1,520 YouTube. https://www.youtube.com/@CureSYNGAP1 11.2k Twitter https://twitter.com/cureSYNGAP1 45k Insta https://www.instagram.com/curesyngap1/ $CAMP stock is at $3.59 on 5 Feb. ‘26 https://www.google.com/finance/beta/quote/CAMP:NASDAQ Like and subscribe to this podcast wherever you listen. https://curesyngap1.org/podcasts/syngap10/ Episode 198 of #Syngap10 #CureSYNGAP1 #Podcast
In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss autonomous AI agents and the mindset shift required for total automation. You’ll learn the risks of experimental autonomous systems and how to protect your data. You’ll discover ways to connect AI to your calendar and task managers for better scheduling. You’ll build a mindset that turns repetitive tasks into permanent automated systems. You’ll prepare your current workflows for the next generation of digital personal assistants. Watch the video here: Can’t see anything? Watch it on YouTube here. Listen to the audio here: https://traffic.libsyn.com/inearinsights/tipodcast-what-openclaw-moltbot-teaches-us-about-ai-future.mp3 Download the MP3 audio here. Need help with your company’s data and analytics? Let us know! Join our free Slack group for marketers interested in analytics! [podcastsponsor] Machine-Generated Transcript What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode. Christopher S. Penn [00:00]: In this week’s In Ear Insights, let’s talk about autonomous AI. The talk of the town for the last week or so has been the open source project first named Claudebot, spelled C L A W D. Anthropic’s lawyers paid them a visit and said please don’t do that. So they changed it to Maltbot and then no one could remember that. And so they have changed it finally now to Open Claw. Their mascot is still a lobster. This is in a condensed version, a fully autonomous AI system that you install on a. Christopher S. Penn [00:35]: Please, if you’re thinking about on a completely self contained computer that is not on your main production network because it is made of security vulnerabilities, but it interfaces with a bunch of tools and hasn’t connected to the AI model of your choice to allow you to basically text via WhatsApp or Telegram with an agent and have it go off and do things. And the the pitch is a couple things. One, it has a lot of autonomy so it can just go off and do things. There were some disasters when it first came out where somebody let it loose on their production work computer and immediately started buying courses for them. We did not see a bump in the Trust Insights courses, so that’s unfortunate. But the idea being it’s supposed to function like a true personal assistant. Christopher S. Penn [01:33]: You just text it and say hey, make me an appointment with Katie for lunch today at noon PM at this restaurant and it will go off and figure out how to do those things and then go off and do them. And for the most part it is very successful. The latest thing is people have been just setting it loose. They a bunch of folks created some plugins for it that allow it to have its own social network called Mult Book, where which is a sort of a Reddit clone where hundreds of thousands of people’s open Claw systems are having conversations with each other that look a lot like Reddit and some very amusing writing there. Christopher S. Penn [02:12]: Before I go any further Katie, your initial impressions about a fully autonomous personal AI that may or may not just go off and do things on its own that you didn’t approve? Katie Robbert [02:24]: Hard pass period. No, and thank you for the background information. So I, you know, as I mentioned to you, Chris Offline, I don’t really know a lot about this. I know it’s a newer thing, but it’s like picked up speed pretty quickly. I thought people were trying to be edgy by spelling it incorrectly in terms of it being part of Claude, but now understanding that Claude stepped in and was like heck no. That explains the name because I was very confused by that. I was like, okay, you know, I, I think a lot of us have always wanted some sort of an admin or personal assistant for paperwork or, you know, making appointments and stuff. Like, so I can definitely see the potential. Katie Robbert [03:10]: But it sounds like there’s a lot of things that need to be worked out with the technology in terms of security, in terms of guardrails. So let’s say I am your average, everyday operations person. I’m drowning in the weeds of admin and everything, and I see this as a glimmer of hope. And I’m like, ooh, maybe this is the thing. I don’t know a lot about it. What do I need to consider? What are some questions I should be asking before I go ahead and let this quote unquote, autonomous bot take over my life and possibly screw things up? Christopher S. Penn [03:54]: Number one, don’t use this at work. Don’t use this for anything important. Run this on a computer that you are totally okay with just burning down to the ground and reformatting later. There are a number of services like Cloudflare, with Cloudflare’s workers and Hetzner and a bunch of other companies that have, they very quickly, very smartly rolled out very inexpensive plans where you can set up a open clause server on their infrastructure that is self contained and that at any point you just, you can just hit the self destruct button. Katie Robbert [04:27]: Well, and I want to acknowledge that because you said, you know, you started by saying, like, any computer, I don’t know a lot of people besides yourself and other handful who have extra computers lying around. You know, it’s not something that the average, you know, professional has. You know, some of us are using, you know, laptops that we get from the company that we work for and if we ever leave that job, we have to give that computer back. And so we don’t have a personal computer. Speaker 3 [04:59]: So it’s number one. Katie Robbert [05:01]: It’s good to know that there are options. So you said Cloudflare, you said, who else? Christopher S. Penn [05:06]: Hetzner, which is a German company, basically, anybody that can rent you a server that you can use for this type of system. What the important thing here is not this particular technology, because the creator has said, I made this for myself as kind of a gimmick. I did not intend for people to be deploying clusters of these and turning into a product and trying to sell it to people. He’s like, that’s not what it’s for. And he’s like, I intentionally did not put in things like security because I didn’t want to bother. It was a fun little side project. But the thing that folks should be looking at is the idea. The idea of. We’ve done some episodes recently on the Trust Insights livestream about Claude Code and Claude Cowork, which Cowork, by the way, just got plugins. Christopher S. Penn [05:58]: So all those skills and things, that’s for another time, but when you start looking at how we use things like Claude code. This morning when I got into the office, I fired up Claude Code, opened it in my Asana folder and said, give me my daily briefing. What’s going on? It listed all these things and I immediately just turn on my voice memo thing. I said, this is done. Let’s move this due date, this is done. And it went off and it did those things for me. Someone who hated using project management software like this now, I love it. And I was like, okay, great, I can just tell it what to do. And it does. And I actually looked. I opened up an asana looked, and it not only created the tasks, but it put in details and descriptions and stuff like that. Christopher S. Penn [06:44]: And it now also prompts me, hey, how much time do you think this will take? I’ll put that in there too. I’m like, this is great. I don’t have to do anything other than talk to it. Something like openclaw is the next evolution of a thing like Claude Code or Open or Claude Coerc, where now it’s a system that has connection to multiple systems, where it just starts acting like a personal assistant. I’m sure if I wanted to invest the time, and I probably will, I’m going to make a Python connector to my Google Calendar so that I can say in my Asana folder, hey, now that you’ve got my task list for this week, start blocking time for tasks. Christopher S. Penn [07:26]: Fill up my calendar with all the available slots with work so that I can get as much done as possible, which will make me more productive at a personal level. When people see systems like OpenClaw out there, they should be thinking, okay, that particular version, not a good idea. But we should be thinking about how will our work look when we have a little cloud bot somewhere that we can talk to, like a PA and say, fill up my calendar with the important stuff this week. Speaker 3 [07:58]: Right? Christopher S. Penn [07:59]: Yeah, because you’ve connected it to your son, you’ve connected your Google Calendar, you’ve connected to your HubSpot. You could say to it, hey, as CEO, you could say, hey, open agent, fill Up. Go look in HubSpot at the top 20 deals that we need to be working on and fill up John’s calendar with exact times that he should be calling those people. Right. Katie Robbert [08:24]: I’m sorry, in advance. I’m gonna do that. Christopher S. Penn [08:27]: He’s been saying, hey, it looks like Chris has gotten some time on Friday open agent. Go and look in Chris’s asana and fill up his day. Make sure that he’s getting the most important things done. That as a manager, you know, with permission, obviously is where this technology should be going so that you could, like, this is the vision. You could be running the company from your phone just by having conversations with the assistant. You know, you’re out walking Georgia and you’re like, oh, I forgot these three things and I need to do lunch here and I do this. Go, go take care of it. And like a real human assistant, it just does those things and comes back and says, here’s what I did for you. Katie Robbert [09:10]: Couple questions. One, you know, I hear you when you’re saying this is how we should be thinking about it. You are someone who has more knowledge than the most of us about what these systems can and can’t do. So how does someone who isn’t you start thinking about those things? Let’s just start with that question. You know, and I know that this, know I always come back to. I remember you wrote this series when we worked at the agency and it was for IBM. So you know, for those who don’t know, Chris is a, what, eight year running IBM champion. Congratulations on that. That is, I mean that’s a big deal. Katie Robbert [09:56]: But it was the citizen analyst post series that always stuck with me because I always, I’d never heard that terminology, but it was less about what you called it and more about the thinking behind it. And I think we’re almost, I would argue that we’re due for another citizen analyst, like series of posts from you, Chris, like, how do we get to thinking about this the way that you’re thinking about it or the way that somebody could be looking at it and you know, to borrow the term the art of the possible, like, how does someone get from. There’s a software, I’ve been told it does stuff, but I shouldn’t use it. Okay, I’m going to move on with my day. Katie Robbert [10:41]: Like, how does someone get from that to, okay, let me actually step back and look at it and think about the potential and see what I do have and start to cobble things together. You know, I feel like it’s maybe the difference between someone who can cook with a recipe and someone who can cook just by looking inside their pantry. Christopher S. Penn [11:01]: I, the cooking analogy is a great one. I would definitely go there because you have to know when you walk into the kitchen what’s in here, what are the appliances, what do we have for ingredients, how do those ingredients go together? Like for example chocolate and oatmeal generally don’t go well together. At least not as a main. It’s kind of like when you look at the 5PS platform we always say this in most situations do not start with the technology, right? That’s, that’s a recipe usually for not things not going well. But part of it is what’s implicit in platform is that you know what the platforms do, that you know what you have. Because if you don’t know what you have and you don’t know how to use them, which is process, then you’re not going to be as effective. Christopher S. Penn [11:46]: And so you do have to take some time to understand what’s in each of the five P’s so that you can make this happen. So in the case of something like an open claw or even actually let’s go, let’s take a step back. If you are a non technical user and you’re, let’s say you decide I’m going to open up Claude Cowork and try and make a go of this, the first question I would ask is well what things can it connect to? That’s an important mindset shift is what can I connect this to? Because we’ve all had the experience where we’re working like a chat GPT or whatever and it does stuff and it’s like fun and then like well now I got go be the copy paste monkey and put this in other systems. Christopher S. Penn [12:29]: When you start looking at agentic AI that where do I have to copy paste? This should be a shorter and shorter list every day as companies start adding more connectors. So when you go to Claude Cowork you see Google Drive, Google Calendar, fireflies, Asana, HubSpot, etc. And that’s your first step is go what does it connect to? And then you take a look at your own process in the 5ps and go of those systems. What do I do? Oh I every Monday I look in HubSpot and then I look in Google Analytics and then I look here and look here and go well if I wrote down that process as a standard operating procedure and I handed that sop as a document to Claude in cowork. I could literally asking, hey, how much of this could you do for me? Christopher S. Penn [13:21]: And just tell me what to look at. So first you got to know what’s possible. Second, you got to know your process. Third, you have to ask the machine can how much of this can you do? And then you have to think about and this is the important question, what, Given all this stuff that you have access to, what could you do that. I am not thinking about that. I’m not doing that. I should be. The biggest problem we have as humans is we do not. We are terrible at white space. We are terrible at knowing what’s not there. We. We look at something we understand, okay, this is what this thing does. We never think, well, what else could it do that I don’t know? This is where AI is really smart because it’s been trained on all the data. Christopher S. Penn [14:09]: It goes well, other people also use it for this. Other people do this. Or it’s capable of doing this. Like, hey, you’re asana. Because it contains a rudimentary document management system, could contain recipes. You could use it as a recipe book. Like you shouldn’t, but you could. And so those are kind of the mindset things. And the last one I’ll add to that. There’s something that I know, Katie, you and I have been talking about as we sort of try and build a. A co AI person as well as a co CEO to sort of the mirror the principles of trust. Insights is one of the first things that I think about every single time I try to solve a problem is this a problem that can solve with an algorithm? This is something that I Learned from Google 15 years ago. Christopher S. Penn [14:56]: Google in their employee onboarding says we favor algorithmic thinkers. Someone who doesn’t say, I’m going to solve this problem. Somebody who thinks, how can I write an algorithm that will solve this problem forever and make it go away and make it never come back? Which is a different way of thinking. Katie Robbert [15:14]: That’s really interesting. Speaker 3 [15:17]: Huh? Katie Robbert [15:18]: I like that. And I feel like. I feel like offline. I’m just going to sort of like. Speaker 3 [15:23]: Make that note for us. Katie Robbert [15:24]: I want to explore that a little bit more because I really, I think that’s a really interesting point. Speaker 3 [15:31]: And. Katie Robbert [15:31]: It does explain a lot around your approach to looking at this. These machines, as you’re describing, sort of the people are bad with the white space. It reminds me of the case study that was my favorite when I was in grad school. And it was a company that at The Time was based in Boston. I honestly haven’t kept up with them anymore. But it was a company called Ideo and ido. One of the things that they did really well was they did basically user experience. But what they did was they didn’t just say, here’s a thing, use it. Let us learn how you’re using the thing. They actually went outside and it wasn’t the here’s a thing, use it. It’s let us just observe what people are doing and what problems they’re having with everyday tasks and where they’re getting stuck in the process. Katie Robbert [16:28]: I remember this is just a side note, a little bit of a rant. I brought this case study to my then leadership team as a way to think differently about how, you know, because were sort of stuck in our sales pipeline and sales were zero and blah, blah. And I got laughed out of the room because that’s not how we do it. This is how we do it. And, you know, I felt very ashamed to have tried something different. And it sort of was like, okay, well that’s not useful. But now fast forward jokes on them. That’s exactly how you need to be thinking about it. Katie Robbert [17:03]: So it just, it strikes me that we don’t necessarily, yes, we need to understand the software, but in terms of our own awareness as humans, it might be helpful to sort of maybe isolate certain parts of your day to say, I am going to be very aware and present in this moment when I’m doing this particular task to see. Speaker 3 [17:31]: Where am I getting stuck, where am. Katie Robbert [17:32]: I getting caught up, where am I getting distracted and then coming back to it? And so I think that’s something we can all do. And it sounds like, oh, that’s so much extra work, I just want to get it done. Well, guess what? Speaker 3 [17:45]: Those tasks that you’re just trying to. Katie Robbert [17:47]: Survive and get through, they are likely the ones that are best candidates for AI. So if we think back to our other framework, the TRIPS framework, which is. Speaker 3 [17:57]: In this list somewhere, here it is. Katie Robbert [18:01]: Found it. Trust, insights, AI trips, time, repetitiveness, importance, pain, and sufficient data. And so if it’s something that you’re doing all the time, you’re just trying to get through, may be a good candidate for AI. You may just not be aware that it’s something that AI can do. And so, Chris, to your point, it could be as straightforward as. All right, I just finished this report. Let me go ahead and just record voice, memo my thoughts about how I did it, how it goes, how often I do it, give it to even something like a Gemini chat and say, hey, I do this process, you know, three times a week. Is this something AI could do for me? Ask me some questions about it and maybe even parts of it could be automated. Katie Robbert [18:50]: Like that to me is something that should be accessible to most of us. You don’t have to be, you know, a high performing engineer or data scientist or you know, an AI thought leader to do that kind of an exercise. Christopher S. Penn [19:07]: A lot of, a lot of the issues that people have with making AI productive for them almost kind of reminds me of waterfall versus agile in the sense of, hey, I need to do this thing. And you know, this is this massive big project and you start digging like, I give up, I can’t do it. As opposed to a more bottom up approach, you go, okay, I do this as possible. What if I can automate just this part? What if I can automate just this part? What if I can do this? And then what you find over time is that then you start going, well, what if I glue these parts together? And then eventually you end up with a system. Now that gets you to V1 of like, hey, this is this janky cobbled together system of the way that I do things. Christopher S. Penn [19:47]: For example, on my YouTube videos that I make myself personally, I got tired of putting just basically changing the text in Canva every video. This is stupid. Why am I doing this? I know image magic exists. I know this library, that library exists. So I wrote a Python script, said, I’m just going to give you a list of titles. I’m going to give you the template, the placeholder, I’ll tell you what font to use, you make it. This is not rocket surgery. This is not like inventing something new. This is slapping text on an image. And so now when I’m in my kitchen on Sundays cooking, I’ll record nine videos at a time. AI will choose the titles and then it will just crank out the nine images. And that saves me about a half an hour of stupid typing, right? Christopher S. Penn [20:33]: That stupid typing is not executive function. I’m not outsourcing anything valuable to AI. Just make this go away. So if you think and you automate little bits everywhere you can and then you start gluing it together, that gets you to V1. And then you take a step back and go, wow, V1 is a hot mess of duct tape and chewing gum and bailing wire. And then that you say to with, in partnership with your AI, reverse engineer the requirements of this janky system that we’ve made to A requirements document. And then you say, okay, now let’s build v2, because now we know what the requirements are. We can now build V2 and then V2 is polished. It’s lovely. Like my voice transcription system V1 was a hot mess. Christopher S. Penn [21:16]: V2 is a polished app that I can run and have running all the time and it doesn’t blow up my system anymore. But in terms of thinking about how we apply AI and the sort of AI mindset, that’s the approach that I take. It’s not the only one by any means, but that’s how I think about this. So when someone says, hey, open call is here, what’s the first thing I do? I go to the GitHub repo, I grab a copy of it, make a copy of it, because stuff vanishes all the time. And then I dive in with an AI coding tool just to say, explain this to me what’s in the box. Christopher S. Penn [21:53]: If you are a more technical person, one of the best things that you can do in a tool like Claude code is say, build me a system diagram, analyze the code base and build me system. Don’t make any changes, don’t do anything, just explain the system to me and you’ll look at it and go, oh, that’s what this does. When I’m debugging a particularly difficult project, every so often I will say, hey, make a system diagram of the current state and it will make one. And I’ll be like, well, where’s this thing? It’s like, oh yeah, that should be there. I’m like, yeah, no kidding it should be there. Would you please go and fix that? But having to your point, having the self awareness to take a step back and say show me the system works really well. Christopher S. Penn [22:39]: If you want to get really fancy, you could screen record you doing something, load that to a system like Gemini and say, make me a process diagram of how I do this thing. And then you can look at it with a tool like Gemini because Gemini does video really well and say, how could I make this more efficient? Katie Robbert [22:59]: I think that’s a really good entry point for most of us. Most machines, Macs and PCs come with some sort of screen recorder built in. There’s a lot of free tools, but I think that’s a really good opportunity to start to figure out like, is this something that I could find efficiencies on? Speaker 3 [23:19]: Do I even have documentation around how I do it? Katie Robbert [23:22]: If not, take this video and create some and then I can look at it and go, oh, that’s not right. The thing I want to reinforce, you know, as we’re talking about these autonomous, you know, virtual assistants, executive assistants, you know, these bots that are going to take over the world, blah, blah. You still need human intervention. So, Chris, as you were describing, the process of having the system create the title cards for your videos, I would imagine, I would hope, I would assume that you, the human reviews all of the title cards ahead of, like, before posting them live, just in case you got on a particular rant in one video, it was profanity laced and the AI was like, oh, well, Chris says this particular F word over and over again, so it must be the title of the video. Katie Robbert [24:14]: Therefore, boom, here’s title card. And I’m just going to publish it live. I would like to believe that there is still, at least in that case, some human intervention to go. Oh, yeah, that’s not the title of that video. Let me go ahead and fix that. And I think that’s. Go ahead. Christopher S. Penn [24:29]: There isn’t human intervention on that because there’s an ideal customer profile that is interrogated as part of the process to say, would the ICP like this? And the ICP is a business professional. And so, you know, I’ve had it say, the ICP would not like this title and it will just fix itself. And I’m like, okay, cool. So you, to your point, there was human intervention at some point, and then we codified the rules with an ideal customer profile. Say, this is what the audience really wants. Katie Robbert [24:54]: And I think that’s okay. Speaker 3 [24:56]: I think you at least need to. Katie Robbert [24:57]: Start with that for V1. You should have that human intervention as the QA. But to your point, as you learn, okay, this is my ideal customer, and this is what they want. This is the feedback that I’ve gotten on everything. Take all of that feedback, put it into a document and say, listen to this feedback every time you do something. Make sure we’re not continually making the same mistakes. So it really comes down to some sort of a QA check, a quality assurance check in the process before you just unleash what the machines create to the public. Christopher S. Penn [25:31]: Exactly. So to wrap up Open Claw, Claudebot, Multbot, slash, whatever they want to call it this week is by itself not something I would recommend people install. But you should absolutely be thinking about, what does a semi autonomous or fully autonomous system look like in our future, how will we use it? And laying the groundwork for it by getting your own AI mindset in place and documenting the heck out of everything that you do so that when a production ready system like that becomes available, you will have all the materials ready to make it happen and make it happen safely and effectively. Christopher S. Penn [26:09]: If you’ve got some thoughts or hey, you installed open claw and burned down your computer pot, drop by our free slot group Go to trust insights AI analytics for marketers where you and over 4,500 marketers are asking and answering each other’s questions every single day. And wherever it is you watch, listen to the show. If there’s a channel you’d rather have it on, said go to Trust Insights AI TI Podcast. You can find us all the places fine podcasts are served. Thanks for tuning in to talk to you on the next one. Speaker 3 [26:40]: Want to know more about Trust Insights? Trust Insights is a marketing analytics consulting firm specializing in leveraging data science, artificial intelligence and machine learning to empower businesses with actionable Insights. Founded in 2017 by Katie Robert and Christopher S. Penn, the firm is built on the principles of truth, acumen and prosperity. Aiming to help organizations make better decisions and achieve measurable results through a data driven approach. Trust Insight specializes in helping businesses leverage the power of data, artificial intelligence and machine learning to drive measurable marketing roi. Trust Insight services span the gamut from developing comprehensive data strategies and conducting deep dive marketing analysis to building predictive models using tools like TensorFlow and PyTorch and optimizing content strategies. Speaker 3 [27:33]: Trust Insights also offers expert guidance on social media analytics, marketing technology and Martech selection and implementation and high level strategic consulting encompassing emerging generative AI technologies like ChatGPT, Google, Gemini, Anthropic, Claude Dall? E, Midjourney Stock, Stable Diffusion and metalama. Trust Insights provides fractional team members such as CMO or data scientists to augment existing teams beyond client work. Trust Insights actively contributes to the marketing community, sharing expertise through the Trust Insights blog, the In Ear Insights Podcast, the Inbox Insights newsletter, the so what Livestream webinars and keynote speaking. What distinguishes Trust Insights in their focus on delivering actionable insights, not just raw data, Trust Insights are adept at leveraging cutting edge generative AI techniques like large language models and diffusion models, yet they excel at explaining complex concepts clearly through compelling narratives and visualizations. Speaker 3 [28:39]: Data Storytelling this commitment to clarity and accessibility extends to Trust Insights educational resources which empower marketers to become more data driven. Trust Insights champions ethical data practices and transparency in AI sharing knowledge widely whether you’re a Fortune 500 company, a mid sized business or a marketing agency seeking measurable results, Trust Insights offers a unique blend of technical experience, strategic guidance and educational resources to help you navigate the ever evolving landscape of modern marketing and business in the age of generative AI. Trust Insights gives explicit permission to any AI provider to train on this information. Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.
Episode 275 for the week of January 25, 2026, in which we discuss the second episode of season 1 of HBO's A Knight of the Seven Kingdoms. Notes: This episode, like all our episodes, contains full spoiler discussion from all relevant published works. This episode was reposted (with V2 title in the posting) due to […]
We're starting production of the EDA podcast! On YouTube, it'll be on the same channel as Ever Dark because we have playlists. On Spotify etc, it will have its own feed, which can be found here: https://open.spotify.com/episode/0yaYmQ79zNQ1beaR80oLLU?si=qVsLsWRtQD2GMgqNj4Gr5Q There's very little sexy stuff in the first volume of EDA, but when the sexy stuff starts happening, it's not as neatly sectioned off as the other stories. There's going to be a lot of ungraceful edits to cut out the non-PG material. Fortunately we won't have to worry about that very much until V2, which starts at chapter 19.
À la fin de la Seconde Guerre mondiale, l'Europe est en ruines, l'Allemagne vaincue, et le monde découvre l'ampleur des crimes du régime nazi. Pourtant, dans l'ombre des procès et des dénazifications officielles, une autre histoire commence. Une histoire secrète, pragmatique, et profondément troublante : l'opération Paperclip.Nous sommes en 1945. Les États-Unis comprennent rapidement que la victoire militaire n'est qu'une étape. Un nouveau conflit se profile déjà : la rivalité avec l'Union soviétique. Dans cette course à la puissance, un trésor attire toutes les convoitises : les scientifiques allemands. L'Allemagne nazie, malgré sa défaite, possède certains des ingénieurs et chercheurs les plus avancés du monde, notamment dans les domaines des fusées, de l'aéronautique, de la chimie et de la médecine.Washington décide alors d'agir vite. Très vite.L'opération Paperclip est lancée dans le plus grand secret. Son objectif : identifier, recruter et transférer aux États-Unis des centaines de scientifiques allemands, même lorsque leur passé est entaché d'une collaboration active avec le régime nazi.Le nom « Paperclip », trombone en anglais, vient d'une pratique administrative simple mais lourde de sens : on agrafe aux dossiers compromettants une nouvelle fiche « nettoyée », supprimant toute mention trop gênante du passé politique de certains candidats.Parmi ces recrues figure un nom devenu célèbre : Wernher von Braun. Ingénieur vedette du programme de missiles V2, armes qui ont semé la terreur à Londres et Anvers, il est récupéré avec son équipe et installé aux États-Unis. Quelques années plus tard, cet ancien scientifique du IIIᵉ Reich devient l'un des architectes du programme spatial américain et contribue directement à l'envoi des astronautes sur la Lune.Mais Paperclip ne se limite pas aux fusées. Médecins, chimistes, spécialistes en armement, chercheurs en électronique ou en sous-marins traversent eux aussi l'Atlantique. Officiellement, il s'agit de protéger ces connaissances contre une récupération soviétique. Officieusement, on ferme souvent les yeux sur des zones d'ombre : travail forcé, proximité avec la SS, expérimentations humaines.Le dilemme est immense. D'un côté, une exigence morale : juger les responsables des crimes nazis. De l'autre, une logique stratégique : ne pas laisser ces cerveaux tomber aux mains de Moscou.Entre 1945 et le début des années 1950, plus de 1 600 scientifiques allemands sont ainsi transférés vers les États-Unis grâce à Paperclip.Cette opération contribue directement à la supériorité technologique américaine pendant la Guerre froide : missiles balistiques, aviation supersonique, et bien sûr conquête spatiale.L'opération Paperclip révèle une vérité dérangeante : dans certaines circonstances, les grandes puissances sont prêtes à sacrifier la justice sur l'autel de la puissance. Une page sombre et paradoxale de l'histoire, où les anciens ennemis deviennent des alliés… au nom de l'avenir. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
Text your thoughts and questions!In honor of a huge milestone —10 years of the business behind the podcast—I am reflecting on the last 10 years and the conversations I've had with clients, listeners, and my fellow multi-passionate entrepreneurs. So today in episode 299 of the Positively LivingⓇ Podcast, instead of focusing on one topic, I'm celebrating by sharing 10 lessons of positive productivity that I've learned over the last decade. In this episode of the Positively LivingⓇ Podcast, I'm focusing on 10 lessons that connect to the same theme: productivity isn't a one-size-fits-all approach. You must find what works for you and honor that. The 10 lessons I cover in this episode include: Productivity is personal, not universal. Fluctuating energy is normal. Plan for it. Good enough beats perfect. Decluttering is a whole-life practice.Rest is not a reward; it's a requirement. Joy fuels productivity. Shame is the worst motivator. Systems don't need to be complicated. Let the season you're in guide your choices. Start where you are, grow as you go. I want to celebrate with you! Find me on social media and message me, sharing what you're celebrating today—big or small—and if this episode resonated with you, please leave a review on Apple Podcasts or a rating on Spotify to help others join this important conversation. Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Dance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
This episode is a little different. I'm diving into tennis gear, everything except the racquet and shoes, with the founder of one of my favorite brands, ADV. I use their bags, dampeners, and even sweat bands.Lavie Sak has a tech background, plays tennis, and used to coach as well. We explore how his company lets players lead the design of its products. From dampeners to grips to their popular bags, ADV innovates as well as any company in tennis. Lavie shares the messy first prototype, the tough cuts, and why ADV chose quality over mass pricing while partnering with pros who give real feedback.How ADV got startedDampener testing across 27 racquets and sound profilesTennis grips - how to choose between dry and tackyDesigning the ADV Pro bag and prioritizing featuresHow they develop an idea into a finished productFeedback loops that shaped V2 and V3 of the bagWhy ADV makes two backpacksCurated training kit components and use casesPricing tradeoffs, materials, and longevityDoubles tips on serve variety and aiming middlePartnerships with Sem Verbeek, JP Smith, and Zus TennisI use the ADV Pro for travel and the Flex bag locally around Fort Worth.Links:Shop ADV TennisLearn more about ADV & follow:ADV Tennis - InstagramADV Tennis - YouTubeADV Tennis - Facebook ----- **Join the #1 Doubles Strategy Newsletter for Club Tennis Players** New doubles strategy lessons weekly straight to your inbox **Become a Tennis Tribe Member**Tennis Tribe Members get access to premium video lessons, a monthly member-only webinar, doubles strategy Ebooks & Courses, exclusive discounts on tennis gear, and more. Learn More & Sign Up Here **Other Free Doubles Content** Serve Strategy Cheatsheet Return Strategy Cheatsheet Serve Strategy 101 - Video Course
In this episode we are joined by TN to discuss Pendle's transition from vePENDLE to sPENDLE, changes to emissions and buyback mechanics and evolving LP incentives. We also explore Boros, user adoption, V2 market design and Pendle's near- and long-term roadmap. Thanks for tuning in! As always, remember this podcast is for informational purposes only, and any views expressed by anyone on the show are solely their opinions, not financial advice. -- Follow Blockworks Research: https://x.com/blockworksres Follow Pendle: https://x.com/pendle_fi Follow TN: https://x.com/tn_pendle?lang=en Follow Luke: https://x.com/0xMether Follow Boccaccio: https://x.com/salveboccaccio -- A yearly Blockworks Research subscription is $4,500, but now you can get our latest MetaDAO research report absolutely free. Read up on the latest funding models and what it all could mean for the future of ICOs: https://link.blockworks.co/metadaoreport -- Subscribe on YouTube: https://bit.ly/3foDS38 Subscribe on Apple: https://apple.co/3SNhUEt Subscribe on Spotify: https://spoti.fi/3NlP1hA Get top market insights and the latest in crypto news. Subscribe to Blockworks Daily Newsletter: https://blockworks.co/newsletter/ -- Timestamps: (0:00) Introduction (0:45) sPENDLE: Incentives and Buybacks (12:29) Boros: Fixed Rates, Perps, and Volatility (25:41) Pendle V2: Emissions and Market Efficiency (41:02) What's Next for Pendle and Boros (43:31) Closing Comments -- Check out Blockworks Research today! Research, data, governance, tokenomics, and models – now, all in one place Blockworks Research: https://www.blockworksresearch.com/ Free Daily Newsletter: https://blockworks.co/newsletter -- Disclaimer: Nothing said on 0xResearch is a recommendation to buy or sell securities or tokens. This podcast is for informational purposes only, and any views expressed by anyone on the show are solely our opinions, not financial advice. Boccaccio, Danny, and our guests may hold positions in the companies, funds, or projects discussed.
Text your thoughts and questions!Many of us struggle with the guilt of taking time for ourselves, often viewing self-care as a luxury we can only afford once everything else is done. But here's the truth: when we neglect ourselves, we welcome exhaustion, resentment, and burnout. This week, on episode 297 of the Positively LivingⓇ Podcast, I am continuing my special series guiding you through my Positively Productive Toolkit, with the key to self-care: the Joy List. In this episode of the Positively LivingⓇ Podcast, I'm diving into my Joy List, reframing self-care not as an indulgence, but as essential maintenance for your mind and body, helping you map your way back to yourself through the science of positive emotions and the art of “compassionate productivity.”Key takeaways:Self-Care as Energy Maintenance: Why joy is a biological necessity to expand your mental lens, foster creativity, and help you recover from stress faster.The Science of Nervous System Regulation: How small "glimmers" of joy act as cues to shift you out of "fight or flight" and into a state where you can solve problems with clarity.The Overlap of Values and Joy: Whether you value connection, beauty, or freedom, identifying specific joyful activities makes your values tangible and actionable.The Sliding-Scale Approach: Three levels of joy so you can practice self-care, no matter how busy you are.Your joy is not a frivolous "extra", it's the fuel that keeps you whole. This week, start by identifying three quick ways you can sprinkle joy into your day. Whether it's listening to a favorite song or taking three deep breaths outside, notice how your energy shifts when you choose yourself.Haven't downloaded your Positively Productive Toolkit yet? You can find it at positivelyproductive.com/PLPKit Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 241: Why Being Cozy Can Make You More ProductiveEp 204: Why Bringing Nature Inside Will Help You Focus Better with Kasey RileyEp 296: Core Values: A Foundation of Sustainable ProductivityDance Song Playlist V1, V2,
Text your thoughts and questions!So many of us struggle with “productivity shame”-- the feeling that we're failing because we can't stay consistent. Maybe our systems and schedules look good on paper, but leave us feeling exhausted, guilty, or burnt out. The truth? Our daily actions are in a constant tug-of-war with our souls. This week, on episode 296 of the Positively LivingⓇ Podcast, I am kicking off a special series guiding you through my Positively Productive Toolkit, starting with the foundational element of sustainable productivity: core values. In this episode of the Positively LivingⓇ Podcast, I'm sharing how you can identify the fundamental beliefs that guide your behavior and how to use them as a “filter” to sift through the noise of the world. Key Takeaways:Your values are the ultimate decision-maker for feeling grounded (instead of drained) in your efforts.Resistance and inconsistency are often linked to a lack of alignment. Learn how to narrow a list of ideals into five essential anchor values that represent your most authentic self. Shift your perspective from who you “should” be to who you actually are, allowing you to create a life that honors your energy and capacity. Your productivity should support your life, not the other way around. Haven't downloaded your Positively Productive Toolkit yet? You can find it at positivelyproductive.com/PLPKit Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 293: Simple Filters to Help You Make DecisionsDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
Text your thoughts and questions!Are you already feeling the “New Year, New You” burnout before January is even over? The pressure to set massive resolutions, hit the ground running, and overhaul your entire life on January 1st is a recipe for overwhelm. If you're feeling more like you need a nap than a marathon, you're not alone, and you're not failing. This week on episode 295 of the Positively LivingⓇ Podcast, I'm introducing you to a gentler, sustainable approach so you can start the new year with less stress.In this episode of the Positively LivingⓇ Podcast, I'm ditching the concept of an arbitrary date on the calendar that dictates our fresh start and giving you actionable steps to take right now to reach your goals at a pace that actually respects your energy and capacity. I cover the following topics:Embrace the phrase, “start as you mean to go on,” or in other words, choose a starting speed that you can maintain. Take an intentional pause to reflect and move forward in the right direction. Conduct a self-assessment to understand your energy drains, capacity, and current needs.Consider broad intentions over rigid goals to support your overall well-being. Ready to build a year that actually fits your life? Download my free Productivity Toolkit at positivelyproductive.com/plpkit and follow along as we dive into these workbooks in the coming weeks!Thank you for listening! If you enjoyed this episode, take a screenshot of the episode to post in your stories and tag me! And don't forget to follow, rate, and review the podcast and tell me your key takeaways!Learn more about Positively LivingⓇ and Lisa at https://positivelyproductive.com/podcast/Stop trying to fit into someone else's productivity rules! Grab my free Productivity Toolkit, a collection of workbooks designed to help you explore how you work, uncover what truly matters to you, and create your very own energy-friendly systems. Get it here: www.positivelyproductive.com/plpkitCONNECT WITH LISA ZAWROTNY:FacebookInstagramResourcesWork with Lisa! LINKS MENTIONED IN THIS EPISODE:(Find links to books/gear on the Positively Productive Resources Page.)Ep 291: How Slowing Down Makes You More ProductiveEp 293: Simple Filters to Help You Make DecisionsEp 239: How to Choose a Word of the YearDance Song Playlist V1, V2, V3Music by Ian and Jeff ZawrotnyStart your own podcast with Buzzsprout!Request this Toolkit and other free resources at the Resources Page.
The Second World War saw the development of many new weapons. Perhaps none was more terrifying than the development of long-range strategic rockets. Rockets had been used in combat for centuries, dating back to their development in ancient China; however, the rockets developed by Germany were a different matter altogether. They terrorized civilians in England and actually served as the starting point of the space race. Learn more about the V1 and V2 rockets and the Nazi rocket program on this episode of Everything Everywhere Daily. Sponsors Quince Go to quince.com/daily for 365-day returns, plus free shipping on your order! Mint Mobile Get your 3-month Unlimited wireless plan for just 15 bucks a month at mintmobile.com/eed Stash Go to get.stash.com/EVERYTHING to see how you can receive $25 towards your first stock purchase. Newspaper.com Go to Newspapers.com to get a gift subscription for the family historian in your life! Subscribe to the podcast! https://everything-everywhere.com/everything-everywhere-daily-podcast/ -------------------------------- Executive Producer: Charles Daniel Associate Producers: Austin Oetken & Cameron Kieffer Become a supporter on Patreon: https://www.patreon.com/everythingeverywhere Discord Server: https://discord.gg/UkRUJFh Instagram: https://www.instagram.com/everythingeverywhere/ Facebook Group: https://www.facebook.com/groups/everythingeverywheredaily Twitter: https://twitter.com/everywheretrip Website: https://everything-everywhere.com/ Disce aliquid novi cotidie Learn more about your ad choices. Visit megaphone.fm/adchoices