Letter in the Greek alphabet
POPULARITY
Categories
MERCH: https://orchideight.com/collections/poorhammer TWITCH: https://www.twitch.tv/poorhammer PATREON: https://www.patreon.com/SolelySingleton On this week's episode, Brad and Eric take a look at the Combat Patrols of 40K 10th Edition and try to make them a better product in hopes that someone at Game's Workshop watches this and presents the new improved combo boxes as their idea to their boss. It's Ok. We don't need any credits. Just give us these combat patrols! SHOW LINKS: Brad's Bsky: https://bsky.app/profile/drruler.bsky.social Eric's Bsky: https://bsky.app/profile/onekuosora.bsky.social OTHER EPISODES OF THIS SERIES: Tyranids, Orks and Guard: https://www.youtube.com/watch?v=r5AQ1mjhy6E Deathwatch, Tau, Dark Angels, Nurgle, Aeldari, Harlequins: https://www.youtube.com/watch?v=q0wc28ka2aE EC, Necrons, Salamanders, Ad Mech: https://www.youtube.com/watch?v=uuhbo-RzVAE Tsons, Death Guard, Votann, Slaanesh, Agents: https://www.youtube.com/watch?v=96mjO4WSFjY Building Better Battleforce Boxes: https://www.youtube.com/watch?v=oiTLhDJvfFI Building Bolder Battleforce Boxes: https://www.youtube.com/watch?v=EXV-_Rpvim0 Building WORSE Combat Patrols: https://www.youtube.com/watch?v=3ThBplwgIZM TIMESTAMPS: 00:00 Hello and Welcome 00:33 Further Ado 02:52 The Anatomy of a Combat Patrol 05:15 Adepta Sororitas 08:54 Grey Knights 14:27 Chaos Daemons - Tzeentch 20:18 World Eaters 23:22 We can't make an episode without committing a crime 28:03 I need a good adjective for the title of the next episode on this series 29:50 Alright Audio Audience Hows It Going Contact Information: You can interact with Solely Singleton by joining the hosts on discord and Twitter to give input to improve the show. Feel free to email more detailed questions and suggestions to the show's email address. Your Hosts: Brad (DrRuler) & Eric (OnekuoSora) Brad's Bsky: https://bsky.app/profile/drruler.bsky.social Eric's Bsky: https://bsky.app/profile/onekuosora.bsky.social Show Email: thepoorhammerpodcast@gmail.com Merch Website: http://www.poorhammer.com/ Edited by: Menino Berilio Show Mailing Address: PO Box 70893 Rochester Hills, MI 48307 Licensed Music Used By This Program: "Night Out" by LiQWYD CC BY "Thursday & Snow (Reprise)" by Blank & Kytt CC BY "First Class" by Peyruis CC BY "Funky Souls" by Amaria CC BY
MERCH: https://orchideight.com/collections/poorhammer TWITCH: https://www.twitch.tv/poorhammer PATREON: https://www.patreon.com/SolelySingleton On this week's episode, Brad and Eric take a look at the Combat Patrols of 40K 10th Edition and try to make them a better product in hopes that someone at Game's Workshop watches this and presents the new improved combo boxes as their idea to their boss. It's Ok. We don't need any credits. Just give us these combat patrols! SHOW LINKS: Brad's Bsky: https://bsky.app/profile/drruler.bsky.social Eric's Bsky: https://bsky.app/profile/onekuosora.bsky.social OTHER EPISODES OF THIS SERIES: Tyranids, Orks and Guard: https://www.youtube.com/watch?v=r5AQ1mjhy6E Deathwatch, Tau, Dark Angels, Nurgle, Aeldari, Harlequins: https://www.youtube.com/watch?v=q0wc28ka2aE EC, Necrons, Salamanders, Ad Mech: https://www.youtube.com/watch?v=uuhbo-RzVAE Tsons, Death Guard, Votann, Slaanesh, Agents: https://www.youtube.com/watch?v=96mjO4WSFjY Building Better Battleforce Boxes: https://www.youtube.com/watch?v=oiTLhDJvfFI Building Bolder Battleforce Boxes: https://www.youtube.com/watch?v=EXV-_Rpvim0 Building WORSE Combat Patrols: https://www.youtube.com/watch?v=3ThBplwgIZM TIMESTAMPS: 00:00 Hello and Welcome 00:33 Further Ado 02:52 The Anatomy of a Combat Patrol 05:15 Adepta Sororitas 08:54 Grey Knights 14:27 Chaos Daemons - Tzeentch 20:18 World Eaters 23:22 We can't make an episode without committing a crime 28:03 I need a good adjective for the title of the next episode on this series 29:50 Alright Audio Audience Hows It Going Contact Information: You can interact with Solely Singleton by joining the hosts on discord and Twitter to give input to improve the show. Feel free to email more detailed questions and suggestions to the show's email address. Your Hosts: Brad (DrRuler) & Eric (OnekuoSora) Brad's Bsky: https://bsky.app/profile/drruler.bsky.social Eric's Bsky: https://bsky.app/profile/onekuosora.bsky.social Show Email: thepoorhammerpodcast@gmail.com Merch Website: http://www.poorhammer.com/ Edited by: Menino Berilio Show Mailing Address: PO Box 70893 Rochester Hills, MI 48307 Licensed Music Used By This Program: "Night Out" by LiQWYD CC BY "Thursday & Snow (Reprise)" by Blank & Kytt CC BY "First Class" by Peyruis CC BY "Funky Souls" by Amaria CC BY
Basti war in Mailand beim olympischen Eishockey und berichtet über dubiose Häuser und Schiedsrichter aus Garmisch. Außerdem tauschen wir uns über industriell verarbeitetes Essen, Haustiere und Ärzte aus, und versuchen am Schaubild nachzuvollziehen, ob Winfried in Anke verliebt ist. Anschließend zieht sich „Der Fußballtrainer“ im Allgemeinen und Besonderen wie ein roter Faden durch die Sendung. Wir sprechen über Kwasniok, und geraten in eine wilde Diskussion, was dessen Einstellung für den Effzeh bedeutet, bewerten Rieras Riposte auf Penis-Fragen und decken die Urs-Fisch-ZDF-Verschwörung auf. Basti und David gründen eine eigene Liga, bei der ganz normale Sachen passieren und alle Spiele in Bonn stattfinden. Zum Schluss klären wir euch noch auf, warum wir möglicherweise in den USA in einer Zelle neben Kim Dotcom sitzen und machen weiter mit unserer Bewertung der WM-Spielort-Plakate. Dabei ist ein rätselhafter Regenwurm eventuell ein Tau, Enzo vergibt Punkte für blaues Wasser, und das Boston-Plakat führt zu weit divergent auseinanderdriftenden Meinungsbildern. Viel Spaß!
In this episode we examine the Kill Team narrative potential of the Tau empire. It's for the greater good.Once Upon a Kill Team - https://www.instagram.com/onceuponakillteam/Jason - https://www.instagram.com/citizendisco/ Seán - https://www.instagram.com/uberstrata_makes/ Josh - https://www.instagram.com/dogtowndistillers/If you want to support this podcast - https://ko-fi.com/onceuponakillteamIf you want to join our discord - https://discord.gg/2bmRUFPXHj
Adam, Andy and Dawfydd sit around the Fluffenfire to discuss their memories of the Tau
NYUMBA returns this February with another full-length show. This month's WeAreiDyll Selects features new music headed to WeAreiDyl Records, with picks from TAU, Native Tribe, Wisso, Da Africa Deep, and Warren Deep. Join us at our events to hear these tracks played live first. The NYUMBA DJ section will be hosted by resident Kakura, bringing the energy with a 35-minute Afro House mix packed with music from Atsou, N1NJA, Vizano, Shyam, and more. Guest DJ: Argento Dust Durban-bred house producer and DJ Argento Dust has built a reputation as one of South Africa's most underrated talents. Emerging from the city's underground scene, his high-calibre releases have earned support from leading artists, including Black Coffee. He has shared stages with respected names across the industry, performed on prominent platforms, and was selected for the Super Africans collective—signalling his rise as a strong contender among Africa's next breakout artists. His journey began in 2010 with the House Junkies collective, and in 2018 he teamed up with Shimza on “All Alone,” released as part of Shimza's All Alone EP. ⚡️Like the Show? Click the [Repost] ↻ button so more people can hear it!
O Mondolivro de hoje fala sobre a obra “O sucesso na Empresa Familiar”, do autor João Pinto Ribeiro. O livro conta a história de sucesso do Tauá Resorts.See omnystudio.com/listener for privacy information.
Hello you soulful delights, this week we dive into the world of the tuna-beef and have 3 amazing guests on to discuss the journey of Tau through this 10th edition. With Durante Bozzini, Will B and Kyle Grundy giving us their thoughts about the codex and all things greater goodness.Hope you enjoy!
Das Losungswort und der Lehrtext der Herrnhuter Brüdergemeine:Eure Liebe ist wie der Tau, der frühmorgens vergeht!Hosea 6,4Jesus spricht: Bleibt in meiner Liebe!Johannes 15,9Titel der Andacht: "Bleibende Liebe"Nachzulesen in nah-am-leben.de
Podcast available on Spotify "Keep It a Buck Daily"LIKE - COMMENT - PLEASE SUBSCRIBETIMESTAMPS:(0:01) - Intro / Thoughts on Paramount + Production for UFC 324(15:07) - UFC Bonuses / Betting Scandals / Fights dropped from UFC 324 Card(30:00) - UFC 324 Recap / Discussion (30:30) - Paddy vs Gaethje (44:00) - O'Malley vs Song (49:45) - Cortes-Acosta vs Lewis(55:35) - Silva vs Namajunas (56:06) - Silva vs Allen (1:01:40) - UFC 324 Prelims (Skim Through)(1:10:20) - Batbayar vs Tau (1:12:46) - Lui vs Sulangrangbo (1:14:00) - Szalay vs Nakamura (1:15:50) - Mar Fan vs Kim (1:17:07) - Ofli vs Yizha (1:20:30) - Micallef vs Elliott (1:24:10) - Malkoun vs Finney (1:28:10) - Rowston vs Brundage (1:30:45) - Tafa vs Elekana UFC 325 MAIN CARD(1:33:05) Salkilld vs Mullarkey (1:34:40) Tuivasa vs Teixeira(1:39:50) Fiziev vs Ruffy (1:46:42) CO MAIN: Hooker vs BSD(1:53:20) MAIN: Volkanovski vs Lopes I post all my final picks on my social media accounts down below. FOLLOW AND SUB THE Social Media accountsTWITTER / X Account: @KIABmediaInstagram: @keepitabuck_mediaTik Tok: @ kiabmedia_
MMA Lock of the Night is back to give you breakdowns and predictions for UFC 325: Volkanovski vs Lopes 2. Also on the card, Hooker vs Saint Denis, Fiziev vs Ruffy, Tuivasa vs Teixeira, and Salkilld vs Mullarkey.
Meskas tus President Donald Trump tau rov qab tsis kam qhia tias seb nqs puas yuav tsis siv tub rog mus txeeb Greenland rau lub caij uas Europe tej coj tseem sam laj tias seb NATO yuav tswj thiab pov puag tej nrim chaw ntawm Arctic li cas. Tau muaj tej xwm txheej no tom qab Trump tau hawv tias yuav tsub se lagluam rau ntau lub teb chaws ntawm Europe yog tias tsis kam muag Greenland rau Meskas.
Tau muaj ib co kev teeb txheeb tshiab qhia tias tej neeg zejzog coj ntseeg ntau yam kab lis kev cai yog cov raug tub ceev xwm cia li tshawb lawv ntau txog li 3.5 npaug piv rau tej zejzog neeg dawb. Thiab pom tias tub ceev xwm Victoria tej fwj chim tshawg neeg no tau ua rau tej neeg coj ntseeg ntau yam kab lis kev cai raug tshawb yam tsis ncaj ncees thiab cuam tshuam rau lawv tej privacy thiab lawv lub neej.
Tau win some of the biggest events of the weekend, we see the first results of 3W Victrix Guard and C'tan on the metagame, and your usual dose of Warhammer Adjacent chatter. ➡ Support the work we do: / statcheck ➡ Check out the Meta Data Dashboard: https://www.stat-check.com/the-meta ➡ Stat Check coaching: https://www.stat-check.com/coaching ➡ Stat Check Merch: https://bit.ly/statcheckmerch ➡ Check out our sponsor the Red Dragon (Stat Check Patrons get 15% off the entire store) at https://red-dragon.ca/ ➡ Check out our sponsor Saltire Games: https://www.saltiregames.co.uk/ ➡ Shop amazing WTC terrain at Weyland-Yutani and save 5% with the code "STATCHECK5": https://www.weyland-yutani-inc.com/ ➡ Looking for GW-style US Open terrain? Check out J15 Games (10% off with code STATCHECK) at https://www.etsy.com/shop/j15games #warhammer40k #warhammer #wh40k #competitivewarhammer #statcheck
Es ist jedes Jahr dasselbe: Kaum ist die Rallye Dakar vorbei, verlagert sich das Interesse von echten Motorsportkennern aufs Eis. Denn die neue Eisspeedwaysaison steht an – und in manchen Regionen hat sie sogar schon begonnen. Dieser neue Podcast der PITCAST-Reihe – eine Gemeinschaftsproduktion der Zeitschrift PITWALK und der Website http://www.bahndienst.com als deutsches Leitmedium für Bahnsport – dreht sich denn auch komplett um den Wintersport. In Weißenbach in Tirol gab's ein Eisspeedwayrennen, im schwedischen Avesta einen auf WM-Niveau besetzten Paarcup – und in Steingaden ein Skijöring. Das sind grob gesagt Speedwayfahrer, die hinter ihren Halblitermaschinen einen Skifahrer auf alpinen Brettern an einem Tau hinter sich herziehen wie ein fleischgewordenes Ausgleichgewicht. Norbert Ockenga hat sich als Chefredakteur von PITWALK und bahndienst.com ausgiebig mit beiden Wintersportarten befasst. In einer Analyse des Paarcup in Avesta befragt er Max Niedermaier und den Engländer Paul Cooper, der sommertags auf Grasbahnen unterwegs ist. Hans Weber, Christoph Kirchner und Franz Mayerbüchler lassen das Spektakel in Weißenbach Revue passieren und ordnen die Lehren daraus ein. Simon Mayer und dessen Mechaniker Patrick Schneider berichten direkt aus Schweden, wie die Vorbereitung auf das Ligawochenende der Allsvenskan in Skandinavien vorangeht. Und Dominik Werkstetter, der das Skijöring als erstes Rennen auf dem harten Comebackweg nach einem schweren Unfall im Frühling genutzt hat, entführt uns in die fremde, aber spannende Welt der Mischung aus Motor- und Alpinistensport.
Alzheimer's Disease is a neurodegenerative disease characterized by the buildup of Amyloid Beta plaques and Tau proteins. The initial symptoms often manifest as a loss of cognitive function, especially with learning and memory. Currently, there are numerous pharmaceutical ways to treat the symptoms of Alzheimer's disease, including drugs to manage the severity of symptoms and clearing plaque. However, a recent paper from the Proceedings of the National Academy of Sciences of the United States of America (PNAS) shows that a new non-pharmaceutical treatment may be a valuable prospect in future Alzheimer's research.This paper details the use of an auditory stimulation of 40Hz on Rhesus macaques possibly clearing Amyloid Beta plaques from the brains of elderly macaques with Alzheimer's pathology. Today, Dr. Jonathan Karp and student producer Kaya Basatemur discuss this paper and what it could mean for future Alzheimer's research and theoretical treatments.
Informações dobre a convocação de aprovados do concurso de Nova Russas; vídeo mostra momento que policial de folga salva casal de afogamento na Praia de Iracema, em Fortaleza; crateuense está desaparecido no município de Tauá.
In this powerful episode of Daily Influence, Gregg-Brooke Koleno sits down with Dr. Zuri Tau—Founder and CEO of Social Insights, a pioneering organization helping foundations, nonprofits, and government agencies turn data into deeper understanding and measurable, community-centered change. With over two decades at the intersection of research, justice, and healing, Dr. Tau invites us to rethink how we define “success,” why metrics alone are never enough, and how true accountability begins with empathy. She shares her journey from early roots in service and community organizing to reshaping how organizations listen, learn, and respond to the people they aim to serve. Together, Gregg and Zuri explore: ✨ How awareness drives responsible influence inside organizations ✨ Why stories, lived experience, and somatic wisdom matter as much as data ✨ The challenge of aligning good intentions with real-world outcomes ✨ How to stay grounded in leadership while facing resistance to change ✨ What it means to “come close” when work—and the world—feels overwhelming ✨ The importance of trying without perfection, taking risks, and stepping toward what brings meaning Dr. Tau's perspective is a timely reminder that influence isn't just about sharing ideas—it's about shaping outcomes that honor people, community, and collective possibility. Her insights call us to reflect not just on what we're doing, but why and how we're doing it. If you're ready to lead with more clarity, courage, and compassion, this is an episode you won't want to miss. Learn More At: https://www.socinsights.com/ https://www.instagram.com/p/CVQrD1VLZk2/ https://
don't miss George's AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk —- From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah Hill-Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really? We discuss: The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard) The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding "I don't know"), and Claude models lead with the lowest hallucination rates despite not always being the smartest GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias) The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron) The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents) Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions) V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models) — Artificial Analysis Website: https://artificialanalysis.ai (https://artificialanalysis.ai ("https://artificialanalysis.ai")) George Cameron on X: https://x.com/grmcameron (https://x.com/grmcameron ("https://x.com/grmcameron")) Micah Hill-Smith on X: https://x.com/_micah_h (https://x.com/_micah_h ("https://x.com/_micah_h")) Chapters 00:00:00 Introduction: Full Circle Moment and Artificial Analysis Origins 00:01:08 Business Model: Independence and Revenue Streams 00:04:00 The Origin Story: From Legal AI to Benchmarking 00:07:00 Early Challenges: Cost, Methodology, and Independence 00:16:13 AI Grant and Moving to San Francisco 00:18:58 Evolution of the Intelligence Index: V1 to V3 00:27:55 New Benchmarks: Hallucination Rate and Omissions Index 00:33:19 Critical Point and Frontier Physics Problems 00:35:56 GDPVAL AA: Agentic Evaluation and Stirrup Harness 00:51:47 The Openness Index: Measuring Model Transparency 00:57:57 The Smiling Curve: Cost of Intelligence Paradox 01:04:00 Hardware Efficiency and Sparsity Trends 01:07:43 Reasoning vs Non-Reasoning: Token Efficiency Matters 01:10:47 Multimodal Benchmarking and Community Requests 01:14:50 Looking Ahead: V4 Intelligence Index and Beyond
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
From creating SWE-bench in a Princeton basement to shipping CodeClash, SWE-bench Multimodal, and SWE-bench Multilingual, John Yang has spent the last year and a half watching his benchmark become the de facto standard for evaluating AI coding agents—trusted by Cognition (Devin), OpenAI, Anthropic, and every major lab racing to solve software engineering at scale. We caught up with John live at NeurIPS 2025 to dig into the state of code evals heading into 2026: why SWE-bench went from ignored (October 2023) to the industry standard after Devin's launch (and how Walden emailed him two weeks before the big reveal), how the benchmark evolved from Django-heavy to nine languages across 40 repos (JavaScript, Rust, Java, C, Ruby), why unit tests as verification are limiting and long-running agent tournaments might be the future (CodeClash: agents maintain codebases, compete in arenas, and iterate over multiple rounds), the proliferation of SWE-bench variants (SWE-bench Pro, SWE-bench Live, SWE-Efficiency, AlgoTune, SciCode) and how benchmark authors are now justifying their splits with curation techniques instead of just "more repos," why Tau-bench's "impossible tasks" controversy is actually a feature not a bug (intentionally including impossible tasks flags cheating), the tension between long autonomy (5-hour runs) vs. interactivity (Cognition's emphasis on fast back-and-forth), how Terminal-bench unlocked creativity by letting PhD students and non-coders design environments beyond GitHub issues and PRs, the academic data problem (companies like Cognition and Cursor have rich user interaction data, academics need user simulators or compelling products like LMArena to get similar signal), and his vision for CodeClash as a testbed for human-AI collaboration—freeze model capability, vary the collaboration setup (solo agent, multi-agent, human+agent), and measure how interaction patterns change as models climb the ladder from code completion to full codebase reasoning. We discuss: John's path: Princeton → SWE-bench (October 2023) → Stanford PhD with Diyi Yang and the Iris Group, focusing on code evals, human-AI collaboration, and long-running agent benchmarks The SWE-bench origin story: released October 2023, mostly ignored until Cognition's Devin launch kicked off the arms race (Walden emailed John two weeks before: "we have a good number") SWE-bench Verified: the curated, high-quality split that became the standard for serious evals SWE-bench Multimodal and Multilingual: nine languages (JavaScript, Rust, Java, C, Ruby) across 40 repos, moving beyond the Django-heavy original distribution The SWE-bench Pro controversy: independent authors used the "SWE-bench" name without John's blessing, but he's okay with it ("congrats to them, it's a great benchmark") CodeClash: John's new benchmark for long-horizon development—agents maintain their own codebases, edit and improve them each round, then compete in arenas (programming games like Halite, economic tasks like GDP optimization) SWE-Efficiency (Jeffrey Maugh, John's high school classmate): optimize code for speed without changing behavior (parallelization, SIMD operations) AlgoTune, SciCode, Terminal-bench, Tau-bench, SecBench, SRE-bench: the Cambrian explosion of code evals, each diving into different domains (security, SRE, science, user simulation) The Tau-bench "impossible tasks" debate: some tasks are underspecified or impossible, but John thinks that's actually a feature (flags cheating if you score above 75%) Cognition's research focus: codebase understanding (retrieval++), helping humans understand their own codebases, and automatic context engineering for LLMs (research sub-agents) The vision: CodeClash as a testbed for human-AI collaboration—vary the setup (solo agent, multi-agent, human+agent), freeze model capability, and measure how interaction changes as models improve — John Yang SWE-bench: https://www.swebench.com X: https://x.com/jyangballin Chapters 00:00:00 Introduction: John Yang on SWE-bench and Code Evaluations 00:00:31 SWE-bench Origins and Devon's Impact on the Coding Agent Arms Race 00:01:09 SWE-bench Ecosystem: Verified, Pro, Multimodal, and Multilingual Variants 00:02:17 Moving Beyond Django: Diversifying Code Evaluation Repositories 00:03:08 Code Clash: Long-Horizon Development Through Programming Tournaments 00:04:41 From Halite to Economic Value: Designing Competitive Coding Arenas 00:06:04 Ofir's Lab: SWE-ficiency, AlgoTune, and SciCode for Scientific Computing 00:07:52 The Benchmark Landscape: TAU-bench, Terminal-bench, and User Simulation 00:09:20 The Impossible Task Debate: Refusals, Ambiguity, and Benchmark Integrity 00:12:32 The Future of Code Evals: Long Autonomy vs Human-AI Collaboration 00:14:37 Call to Action: User Interaction Data and Codebase Understanding Research
From creating SWE-bench in a Princeton basement to shipping CodeClash, SWE-bench Multimodal, and SWE-bench Multilingual, John Yang has spent the last year and a half watching his benchmark become the de facto standard for evaluating AI coding agents—trusted by Cognition (Devin), OpenAI, Anthropic, and every major lab racing to solve software engineering at scale. We caught up with John live at NeurIPS 2025 to dig into the state of code evals heading into 2026: why SWE-bench went from ignored (October 2023) to the industry standard after Devin's launch (and how Walden emailed him two weeks before the big reveal), how the benchmark evolved from Django-heavy to nine languages across 40 repos (JavaScript, Rust, Java, C, Ruby), why unit tests as verification are limiting and long-running agent tournaments might be the future (CodeClash: agents maintain codebases, compete in arenas, and iterate over multiple rounds), the proliferation of SWE-bench variants (SWE-bench Pro, SWE-bench Live, SWE-Efficiency, AlgoTune, SciCode) and how benchmark authors are now justifying their splits with curation techniques instead of just “more repos,” why Tau-bench's “impossible tasks” controversy is actually a feature not a bug (intentionally including impossible tasks flags cheating), the tension between long autonomy (5-hour runs) vs. interactivity (Cognition's emphasis on fast back-and-forth), how Terminal-bench unlocked creativity by letting PhD students and non-coders design environments beyond GitHub issues and PRs, the academic data problem (companies like Cognition and Cursor have rich user interaction data, academics need user simulators or compelling products like LMArena to get similar signal), and his vision for CodeClash as a testbed for human-AI collaboration—freeze model capability, vary the collaboration setup (solo agent, multi-agent, human+agent), and measure how interaction patterns change as models climb the ladder from code completion to full codebase reasoning.We discuss:* John's path: Princeton → SWE-bench (October 2023) → Stanford PhD with Diyi Yang and the Iris Group, focusing on code evals, human-AI collaboration, and long-running agent benchmarks* The SWE-bench origin story: released October 2023, mostly ignored until Cognition's Devin launch kicked off the arms race (Walden emailed John two weeks before: “we have a good number”)* SWE-bench Verified: the curated, high-quality split that became the standard for serious evals* SWE-bench Multimodal and Multilingual: nine languages (JavaScript, Rust, Java, C, Ruby) across 40 repos, moving beyond the Django-heavy original distribution* The SWE-bench Pro controversy: independent authors used the “SWE-bench” name without John's blessing, but he's okay with it (”congrats to them, it's a great benchmark”)* CodeClash: John's new benchmark for long-horizon development—agents maintain their own codebases, edit and improve them each round, then compete in arenas (programming games like Halite, economic tasks like GDP optimization)* SWE-Efficiency (Jeffrey Maugh, John's high school classmate): optimize code for speed without changing behavior (parallelization, SIMD operations)* AlgoTune, SciCode, Terminal-bench, Tau-bench, SecBench, SRE-bench: the Cambrian explosion of code evals, each diving into different domains (security, SRE, science, user simulation)* The Tau-bench “impossible tasks” debate: some tasks are underspecified or impossible, but John thinks that's actually a feature (flags cheating if you score above 75%)* Cognition's research focus: codebase understanding (retrieval++), helping humans understand their own codebases, and automatic context engineering for LLMs (research sub-agents)* The vision: CodeClash as a testbed for human-AI collaboration—vary the setup (solo agent, multi-agent, human+agent), freeze model capability, and measure how interaction changes as models improve—John Yang* SWE-bench: https://www.swebench.com* X: https://x.com/jyangballinFull Video EpisodeTimestamps00:00:00 Introduction: John Yang on SWE-bench and Code Evaluations00:00:31 SWE-bench Origins and Devon's Impact on the Coding Agent Arms Race00:01:09 SWE-bench Ecosystem: Verified, Pro, Multimodal, and Multilingual Variants00:02:17 Moving Beyond Django: Diversifying Code Evaluation Repositories00:03:08 Code Clash: Long-Horizon Development Through Programming Tournaments00:04:41 From Halite to Economic Value: Designing Competitive Coding Arenas00:06:04 Ofir's Lab: SWE-ficiency, AlgoTune, and SciCode for Scientific Computing00:07:52 The Benchmark Landscape: TAU-bench, Terminal-bench, and User Simulation00:09:20 The Impossible Task Debate: Refusals, Ambiguity, and Benchmark Integrity00:12:32 The Future of Code Evals: Long Autonomy vs Human-AI Collaboration00:14:37 Call to Action: User Interaction Data and Codebase Understanding Research Get full access to Latent.Space at www.latent.space/subscribe
Der meistgestreamte Song des Jahres in Deutschland heißt "Tau mich auf” von Zartmann
Tau muaj lub koom txoos ua kev zoo siab rau Hmoob tuaj nyob tau 50 xyoo ntawm Australia. Lub koom txoos no yog ua coj los nco txog thawj 8 tus tub kawm uas tau pab tsim Hmoob community kom zoo li tam sim no, tham txog tias seb Hmoob lub neej zoo li cas tam sim no lawm thiab yog ib feem ntawm teb chaws Australia no yuav ua dab tsi ntxiv rau yav pem suab. Tau muaj tej nom tswv xeev (Jacinta Allan) xa lus mus koom, muaj Victoria Multucultural Affairs tus coj thiab tau muaj Labor tus nom Greenvale nrog rau tus neeg sawv cev ntawm lub tuam chav tswj tej dej num multicultural affairs, Hmoob tej tub koom siab thiab tub koom siab Nplog tau mus koom lub koom txoo hnub tim 20 lub 12 hlis ntuj xyoo 2025 ntawm zos Bulla, Hume Reception ze tshav dav hlau Melbourne.
Tau muaj Askiv lub tuam txhab Virtical Aerospace tau coj ib co qauv tsheb air taxi hu ua Valo los sim ya uas cia siab tias yuav coj los thaum neeg hauv tej nroog. Ua ke no los kuj yog ib kauj ruam uas ntiaj teb tej lagluam no yuav loj hlob thiab muaj hauj lwm rau sawv daws ua, thiab nom tswv Australia los kuj tab tom npaj tias seb puas tau siv ntawm Australia rau xyoo 2027-29 no. Txawm li cas los tej kws paub zoo txog tej hauj lwm no hais tias yuav tsum tau tshwm nyiaj los pab kom ua tau cov lagluam no, thiab tsim tej chaw nres rau ntau qhov chaw thiab tsim tej cai los tswj mas thiaj yuav siv thiab ua tau tej lagluam no.
Tau muaj Australia tej kws pab zoo txog kev nyab xeeb rau tej hav zoov kub hnyiab tau qhia tias ntau cov xwm txheej hav zoov kub hnyiav tsis ntev los no ntawm Tasmania, Western Australia thiab Central Coast New South Wales ua tau ua rau tej vaj tse raug kub hnyiab tseem yog ib co kev pib lub caij hav zoov kub hnyiab xwb, yog li ntawd thiaj ceeb toom kom tsis hais ib tug twg li uas tab tom ncaim nrug ntawm yus tsev npaj yuav mus holiday, mus camping yuav tau ua tib zoo npaj kom thiaj tau txais kev nyab xeeb rau yus.
Chronic traumatic encephalopathy, or CTE, is a neurodegenerative disease linked to repeated head injuries. It has been found in professional athletes, soldiers, and others who have experienced years of those traumas. New research from Harvard Griffin GSAS alumni Chanthia Ma and Guanlan Dong may help us better understand this condition. Their study looks at the smallest units of brain biology—individual neurons—and finds surprising clues written in the DNA itself. Using single-cell genome sequencing, they discovered that neurons in people with CTE carry distinctive patterns of genetic damage—patterns that may overlap with those seen in Alzheimer's disease. In this episode of Colloquy, Ma discusses how her work not only sheds light on how brain trauma leads to long-term decline but also hints at possible shared mechanisms across different neurodegenerative conditions.
Tau muaj ib co xovxwm tshiab ntawm lub chaw teeb txheeb Cotalitty txog tej lagluam vas tse ces pom tias tej kab theem paj txo peb zaug dhau los pab tsis tau tej neeg yuav thawj lub tsev nyob dab tsi, tej nqe tsev los kuj kim sai tshaj tej nyiaj neeg Australia khwv tau lawm, thiab tej nqe ntiav tsev nyob uas kim tuaj ntxiv no tau ua rau tej neeg yuav tau sib koom thiaj yuav ntiav tau tsev nyob rau lub caij muaj nuj nqes ntau yam siv ua lub neej.
Tau muaj UNICEF ib tsab ntawv cej luam tshiab tau qhia tias muaj Australia tej me nyuam tsis tshua muaj kev cia siab rau lawv lub neej pem suab, thiab muaj tej me nyuam no txog 63 feem pua xav tias ntshe lawv haj yam tsis muaj peev xwm ua lawv lub neej npaum li lawv niam lawv txiv lub neej lawm. Yeej muaj tej hluas no 43 feem pua uas hnoob nyoog txij 12 xyoos xwb twb txhawj txog nyiaj txiaj siv thiab vaj tse nyob lawm. Penny Dakin uas yog tus coj ntawm lub koom haum Minderoo thiaj hais tias 'yuav tau pab tej me nyuam, pab tej cuab yig, thiab tsim cov program txhim kho los pab kom tej me nyuam thiaj pib tau lawv lub neej kom zoo tshaj plaws npaum li qhov lawv muaj peev xwm pib tau'. Thiab tsab ntawv cej luam no hais kom teeb tsa lub koom haum National Early Childhood Commission pab me nyuam thiab Ashwini Aravinthan uas yog UNICEF tus youth ambassador hais tias 'yuav tsum lees paub thiab cia tej hluas muaj feem tawm tswv yim thiab txiav txim siabtsim tej cai tsim kev pab cuam rau lawv kom ntau tshaj qub ntxiv.'
Tau kwv yees tias ntiaj teb tej lagluam muag tsiaj nruab nrag (tsiaj qus) txhaum cai raug nyiaj txog li 32 billion dollars - thiab vim Australia yeej muaj ntau hom tsiaj uas muaj tsawg tsawg ces hais neeg no tej tshiaj thiaj raug neeg phem nyiag muag ntawm tej kiab khw txhaum cai. Ces hnub tiv thaiv thiab tawm tsam nrog txhua cov kev ua txhaum cai hla teb chaws (15 lub 11 hlis ntuj) no thiaj tau muaj tej kws teeb txheeb los qhia txog tej kev ua txhaum cai no, los tiv thaiv thiab raus tes cuam tshuam tej lagluam no kom thiaj tsis txhob ua rau muaj teeb meem puas tsuaj loj rau Australia tej eco-system thiab tej lagluam.
As little as 3000 steps per day can slow progression to Alzheimer's Disease; Self-reports of memory impairment soaring among young people; New study vindicates unprocessed red meat—and even often-vilified processed red meat—for cancer and overall health. Prostate artery embolization (PAE) offers new non-invasive option for men's age-related urinary problems; Targeting the mitochondria and the microbiome for Parkinson's Disease; Popular prostate and hair loss prevention drugs linked to depression and suicide—while Cialis for urinary symptoms may stave off cardiovascular disease; Discovery that a safe, cheap medication may increase survival after breast cancer surgery.
Send us comments, suggestions and ideas here! In this week's episode we explore the final three chapters of Liber ARARITA, an obscure class A holy book from the mystic religion of Thelema which operates as an alchemical tool to reduce the entirety of the universe to a unity with God using Hebrew letters, angel math and creative visualization that is illegal in at least six countries. In the free side of the show we discuss the origins and occult meaning behind the Hebrew letter Yod and its complement in the text Tau. We discuss the Hermetic version of Neti-Neti, the High Priestess of the Tarot and the little dogs of hell. In the extended episode we loosen the eight belts of heaven, break into the Outer College, plough Venus, finish The Great Work and Go Beyond The Words of the Fool. Thank you and enjoy the show!In this week's episode we discuss:Hebrew Letters Yod and Tau (IT)The Little Dogs of HellHadit and NuitCrossing the AbyssExeptus AdeptThe Little Dogs of HellThe High PriestessNo, Certainly Not!In the extended episode at www.patreon.com/TheWholerabbit we finish the text and conclude by discussing:The Egyptian OMTo Know / SwallowThe Fire KadoshAt The EndSpiritual AlchemyThe Eight BeltsBeyond The Words of the FoolThis episode was prepared by Luke Madrid and Heka Astra, quotes read by Tim Hacker, Blue sections prepared by Mari Sama.Where to find The Whole Rabbit:Spotify: https://open.spotify.com/show/0AnJZhmPzaby04afmEWOAVInstagram: https://www.instagram.com/the_whole_rabbitTwitter: https://twitter.com/1WholeRabbitOrder Stickers: https://www.stickermule.com/thewholerabbitOther Merchandise: https://thewholerabbit.myspreadshop.com/Music By Spirit Travel Plaza:https://open.spotify.com/artist/30dW3WB1sYofnow7y3V0YoSources:Liber ARARITA / IAO 131https://iao131.com/commentaries/liber-dcccxiii-vel-ararita-sub-figura-dlxx/Book of Thoth:https://dn710008.ca.archive.org/0/items/out-of-print-and-rare-books-collection/BookOfThoth.pdfBook of the Law:https://sacred-texts.com/oto/engccxx.htmDion Fortune, Mystical KabbalahAleister Crowley, The Vision and the VoiceSupport the show
Tau muaj ib cov kev sib khom pom zoo nqes peev lagluam raug nyiaj ntau billions rau cov rare earths thiab critical minerals ua ke ntawm Australia thiab Meskas, ces thiaj ua rau tej nqe share ntawm tej tuam txhab lagluam khawb peev txheej kim tuaj ntxiv. Lub hom phiaj yog xyuas kom tsis txhob cia Suav tuam tshoj yog tib lub teb chaws uas tswj thiab xav tej khoom no muag rau ntiaj teb nkaus xwb.
Tau muaj lub rooj sab laj tseem ceeb ntawm Australias tus thawj pwm tsav thiab Meskas tus thawj tsav meem (President) rau cov hauj lwm tub rog AUKUS thiab cov kev koom nqes peev ua cov lagluam critical minerals thiab rare earths raug nyiaj ntau billion dollars uas muaj hom phiaj kom tau tej khoom no siv txhua hnub, tsis txhob cia Suav yog tib lub teb chaws tswj tej khoom no hauv ntiaj teb, thiab kom tau tej khoom no coj los siv ua tej hauj lwm tub rog, energy thiab tsheb siv hluav taws xob.
The Independent Characters - A Warhammer 40k Podcast | Radio
Episode 267 of The Independent Characters arrives right on schedule, and it's a big one, both in content and in subject matter. This time, we're striding into the battlefield with not one, but two colossal armies in our latest Show of Force. Join us as we explore both the Imperial Knights and their dark reflections, the Chaos Knights. These towering engines of war have a presence on the tabletop and in the lore that is impossible to ignore, and we're diving deep into every aspect of them. Edit: So as we started recording – we realize that this would turn into a 5-6 hour episode. Carl was, once again, too optimistic regarding how much we could cram into an episode. So we made the decision to divide this topic into two parts. Episode 268 will cover the Chaos Knights as Part 2 of Oaths and Damnation. Over the course of the episode, we cover their origins and history, the unique culture of the Knight Houses, and how those sworn oaths of loyalty can either lead to noble service to the Imperium or a fall into the corrupting grip of the Dark Gods. From lore to codex, units to battlefield roles, we've got it all packed in. As our Show of Force Episodes go – we'll also take a look back at their evolution over the years – how these armies have grown in design, rules, and miniature range to become some of the most striking models in the Warhammer 40,000 universe. Carl and Adan – as well as a special guest, will unpack the details of these armies over the course of several hours. Fire up your void-shields because Episode 267 releases on October 18th, 2025. Time Stamps: 0:00:00 – Show Intro, Elite Choice, Hobby Progress 1:00:40 – Oaths & Damnation – Part 1 (First Half) 1:54:10 – Oaths & Damnation – Part 1 (Second Half) 3:23:10 – War on The Shore Narrative Event! 3:46:30 – Final Thoughts and show closing Relevant Links: The Independent Characters Patreon Tablewar! – SPONSOR Herrick Games & Hobbies – SPONSOR Imperium Maledictum by Cubicle7 Goonhammer War on The Shore 2026 Adepticon Games Workshop The Black Library
Tau muaj ib tsab ntawv cej luam tshiab nqua hu kom tsim cov national database los khaws Australia tej poj niam cov kev nchuav me nyuam vim tseem pom tau tais yog tej yam uas tsis muaj neeg kub siab txog, hais tsi ntsees rau Australia tej poj niam neeg txumtim thiab tej neeg nyob zos toj siab, tom qab tsaib no uas tsoom fwv tau qhia tias yuav pab nyiaj $9.5 million los qhia thiab pab kom tej neeg paub txog cov kev nchuav me nyuam.
Tau tsim dua ib co tswv yim tshiab los kho tus tej neeg mob dementia ntawm tej tsev laus tom qab lub koom haum Australian Institute of Health and Welfare tau teev tias tus mob dementia yog ib tug mob ua rau muaj neeg tas sim neej coob heev ntawm teb chaws Australia.
Tau muaj ib co kev teeb txheeb hais tias tej zaum tej neeg uas pheej nyiam siv/kov xov tooj thaum mus zaum ntawm chav plog yuav ua rau muaj teeb meem rau lub rooj nplos qhov quav ntxob, los ntshav. Tab sis kuj muaj lwm cov kev teeb txheeb hais tias tej zaum kuj tsis yog li hais ntawd thiab. Qhov tseeb yog li cas tiag?
In this episode, Anna Rose and Nico Mohnblatt catch up with Ian Miers from the University of Maryland, starting with his work on seminal ZK blockchain research, Zerocoin and Zerocash and the creation of the first zk-focused blockchain project Zcash. They then explore the history of trusted setups, including the trusted setup bug discovery in Zcash, and subsequent improvements like Powers of Tau. Ian also discussed his work on ZEXE, a system that has inspired the formation of Aleo, and his more recent works: zk-creds for building flexible anonymous credentials from existing identity signals like passports, and zk-promises for supporting anonymous reputation, moderation, and callbacks in decentralized systems. They also touch on broader topics like post-quantum security considerations, sybil resistance, and the need for programmable privacy tools. Related Links Ian Miers: Academic profile and publications Zerocoin: Anonymous Distributed E-Cash from Bitcoin Pinocchio: Nearly Practical Verifiable Computation Zerocash: Decentralized Anonymous Payments from Bitcoin Zcash: Privacy-preserving cryptocurrency based on Zerocash protocol Zexe: Enabling Decentralized Private Computation Powers of Tau Ceremony: Zcash Foundation's multi-party computation for secure zk-SNARK parameters Powers-of-Tau to the People: Decentralizing Setup Ceremonies zk-creds: Flexible Anonymous Credentials from zkSNARKs and Existing Identity Infrastructure zk-promises: Anonymous Moderation, Reputation, and Blocking from Anonymous Credentials with Callbacks Decentralized Anonymous Credentials Sonic: Zero-Knowledge SNARKs from Linear-Size Universal and Updatable Structured Reference Strings Quadratic Span Programs and Succinct NIZKs without PCPs
À première vue, le langage humain semble foisonnant, foisonnant au point d'être chaotique. Chaque langue possède ses milliers de mots, ses tournures, ses exceptions et ses bizarreries. Pourtant, derrière cette apparente complexité, se cachent des règles d'une rigueur étonnamment… mathématique. L'une des plus fascinantes a été mise en lumière dans les années 1930 par le linguiste américain George Zipf : la loi d'abréviation.Une loi simple mais puissanteFormulée par Zipf, cette règle décrit une tendance universelle : plus un mot est fréquemment utilisé, plus il tend à être court. Prenons un exemple en français : “et”, “de”, “à” ou “je”. Ces mots ultra-fréquents ne comptent qu'une ou deux lettres. À l'inverse, les termes plus rares – “chlorophylle”, “hétérozygote” ou “incommensurable” – sont plus longs. En d'autres termes, notre cerveau, en quête permanente d'efficacité, réserve la brièveté aux mots du quotidien et accepte la longueur pour les mots occasionnels.L'efficacité comme moteurCette loi n'a rien d'un hasard : elle illustre ce que Zipf appelait le principe du moindre effort. Quand nous communiquons, nous cherchons naturellement à transmettre un maximum d'informations avec un minimum d'effort. Les mots courts, faciles à prononcer et rapides à écrire, remplissent ce rôle pour les idées que nous utilisons le plus souvent. Cette logique contribue à rendre les échanges plus fluides et à limiter la fatigue cognitive, aussi bien pour celui qui parle que pour celui qui écoute.Une règle universelle ?Ce qui intrigue les chercheurs, c'est que cette loi ne semble pas se limiter aux langues humaines. Des travaux récents en bioacoustique ont montré que certains oiseaux suivent exactement la même tendance. Les sons les plus fréquents qu'ils utilisent – pour marquer un territoire, avertir d'un danger ou attirer un partenaire – sont plus courts que leurs vocalisations plus rares. Autrement dit, les oiseaux appliquent eux aussi, sans le savoir, la loi d'abréviation de Zipf.Quand l'évolution rejoint les mathématiquesPourquoi cette convergence entre humains et oiseaux ? Les scientifiques avancent que cette règle pourrait refléter un principe fondamental de toute communication efficace. Que l'on manipule des mots ou des chants, l'économie d'énergie et de temps favorise la survie. Les individus capables de transmettre rapidement l'essentiel de l'information disposent d'un avantage, qu'il s'agisse de fuir un prédateur ou de collaborer en groupe.Un langage moins chaotique qu'il n'y paraîtAu fond, ce que révèle Zipf, c'est que nos langues, si diverses soient-elles, obéissent à des forces universelles. Elles ne sont pas des constructions aléatoires, mais des systèmes façonnés par la recherche d'efficacité. Et lorsque nous découvrons que les oiseaux – et peut-être d'autres espèces encore – obéissent à la même loi, cela suggère que les mathématiques ne se contentent pas de décrire le monde physique : elles gouvernent aussi la manière dont nous échangeons des idées et des émotions.Ainsi, derrière nos conversations quotidiennes, se cache une règle mathématique discrète mais incontournable, qui relie l'homme… aux oiseaux. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
The Independent Characters - A Warhammer 40k Podcast | Radio
In Episode 265, we explore what it means to be a hobbyist over time - from the first spark of excitement, to balancing life's responsibilities, to rediscovering joy in unexpected ways. We're diving into how the hobby grows with us, and how each phase of life brings something new to the tabletop. This episode features reflections and insights from both our hosts and members of the community, sharing how their passion for Warhammer 40K has changed over the years, and what keeps them coming back. Whether you're painting your first Intercessor or pulling out Rogue Trader-era minis, this one's for you. Join us as we celebrate the long game of 40K and the people who've made it a lifelong journey. Time Stamps: 0:00:00 - Show Intro, Elite Choice, Hobby Progress 0:31:40 - 40k For a Lifetime: Part 1 1:35:00 - 40k For a Lifetime: Part 2 2:04:10 - Final Thoughts and show closing 2:12:40 - 40k For a Lifetime: Secret Last Part Relevant Links: The Independent Characters Patreon Tablewar! - SPONSOR Herrick Games & Hobbies - SPONSOR Imperium Maledictum by Cubicle7 Goonhammer War on The Shore 2026 Adepticon Games Workshop The Black Library
À première vue, le langage humain semble foisonnant, foisonnant au point d'être chaotique. Chaque langue possède ses milliers de mots, ses tournures, ses exceptions et ses bizarreries. Pourtant, derrière cette apparente complexité, se cachent des règles d'une rigueur étonnamment… mathématique. L'une des plus fascinantes a été mise en lumière dans les années 1930 par le linguiste américain George Zipf : la loi d'abréviation.Une loi simple mais puissanteFormulée par Zipf, cette règle décrit une tendance universelle : plus un mot est fréquemment utilisé, plus il tend à être court. Prenons un exemple en français : “et”, “de”, “à” ou “je”. Ces mots ultra-fréquents ne comptent qu'une ou deux lettres. À l'inverse, les termes plus rares – “chlorophylle”, “hétérozygote” ou “incommensurable” – sont plus longs. En d'autres termes, notre cerveau, en quête permanente d'efficacité, réserve la brièveté aux mots du quotidien et accepte la longueur pour les mots occasionnels.L'efficacité comme moteurCette loi n'a rien d'un hasard : elle illustre ce que Zipf appelait le principe du moindre effort. Quand nous communiquons, nous cherchons naturellement à transmettre un maximum d'informations avec un minimum d'effort. Les mots courts, faciles à prononcer et rapides à écrire, remplissent ce rôle pour les idées que nous utilisons le plus souvent. Cette logique contribue à rendre les échanges plus fluides et à limiter la fatigue cognitive, aussi bien pour celui qui parle que pour celui qui écoute.Une règle universelle ?Ce qui intrigue les chercheurs, c'est que cette loi ne semble pas se limiter aux langues humaines. Des travaux récents en bioacoustique ont montré que certains oiseaux suivent exactement la même tendance. Les sons les plus fréquents qu'ils utilisent – pour marquer un territoire, avertir d'un danger ou attirer un partenaire – sont plus courts que leurs vocalisations plus rares. Autrement dit, les oiseaux appliquent eux aussi, sans le savoir, la loi d'abréviation de Zipf.Quand l'évolution rejoint les mathématiquesPourquoi cette convergence entre humains et oiseaux ? Les scientifiques avancent que cette règle pourrait refléter un principe fondamental de toute communication efficace. Que l'on manipule des mots ou des chants, l'économie d'énergie et de temps favorise la survie. Les individus capables de transmettre rapidement l'essentiel de l'information disposent d'un avantage, qu'il s'agisse de fuir un prédateur ou de collaborer en groupe.Un langage moins chaotique qu'il n'y paraîtAu fond, ce que révèle Zipf, c'est que nos langues, si diverses soient-elles, obéissent à des forces universelles. Elles ne sont pas des constructions aléatoires, mais des systèmes façonnés par la recherche d'efficacité. Et lorsque nous découvrons que les oiseaux – et peut-être d'autres espèces encore – obéissent à la même loi, cela suggère que les mathématiques ne se contentent pas de décrire le monde physique : elles gouvernent aussi la manière dont nous échangeons des idées et des émotions.Ainsi, derrière nos conversations quotidiennes, se cache une règle mathématique discrète mais incontournable, qui relie l'homme… aux oiseaux. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.
Dr. Len Tau, aka the Reviews Doctor, is on the podcast. With Kiera, he goes into the most critical nuts and bolts of making sure your practice stands out (or at least keeps pace with) online reviews amid AI. He explains jargon terms like ranking power and factors and velocity of reviews, whether or not you should actually be responding to reviews of your practice, and a ton more. Visit SuperchargeYourDentalPractice.com and enter the code RAVING to save $100 on registration for Dr. Tau's annual conference. About Dr. Tau Dr. Len Tau thrives on helping practices maximize their online reputation, marketing, and social media strategies. As a speaker, Len is known for his lively and engaging presentations packed with ready-to-use strategies. He regularly travels the country sharing his marketing brilliance and passion for practice growth with audiences. As a consultant, he offers practice leaders with real-world solutions tailored to fit their specific challenges and opportunities. Len loves to help doctors and their teams understand and implement successful online systems to build their practice. He currently serves as general manager of the Dental for Birdeye Reputation Marketing Software. Selected as one of Philadelphia's Top Dentists by Philadelphia Magazine, he continues to experience growth year after year in his fee-for-service practice focusing on general, cosmetic, reconstructive and implant dentistry. Following his father into the dental profession, Len graduated from Tufts University School of Dental Medicine and continues to pursue ongoing continuing education opportunities. He has had the privilege of serving patients for two decades. He is an active member of numerous professional organizations including the American Dental Association, the Pennsylvania Dental Association, the Academy of General Dentistry, the Eastern Dental Society, the Northeast Philadelphia Dental Implant Study Club, and the American Academy of Clear Aligners. Episode resources: Subscribe to The Dental A-Team podcast Schedule a Practice Assessment Leave us a review Kiera Dent (00:00) Hello, Dental Team listeners. This is Kiera and today I am so excited. This is one of my dear friends. We've known each other for several years in the industry. I'm super freaking pumped. I'm going actually be at his event next year in September. Little teaser. Stay tuned. He's got an amazing event he does every year in September. I have Dr. Len Tau. He is one of my faves. He is better known ⁓ as an authority in the dental consulting world, reputation marketing, and a practice growth. He's recognized by dentistry today as a top dental consultant for eight straight years. He is the author of Raving Patients and 100 plus tips to 105 star reviews in a hundred days. Like this man knows how to do it. He's one of my faves. We really do collaborate on so many fun things. After 20 plus years in clinical practice, he now helps dentists nationwide, increase revenue, case acceptance and visibility. He leads the dental vision at BirdEye, hosts the Raving Patients podcast and runs the Supercharger Dental Practice Conference, which is the one I was alluding to that we're gonna be at next year in September, empowering practices to thrive in today's competitive landscape. He's truly one of my faves. And today we're gonna dig into like, how do you get online reviews? But Len, welcome. I'm so happy to have you on the podcast. How are you today? Dr. Len Tau (01:06) I'm good, thanks for having me, I'm excited to be here. Kiera Dent (01:08) Of course. And this just came about because Len like, let's just do a little teaser. You're prepping full like steam ahead right now for your event that's coming up in September in Florida. ⁓ I love like the last time you and I were on the podcast, we talked about you in clinical dentistry. And then we reconnected after some time and you've left the chair, you're living your best life and you full blown gone into the event space. So just like, I know we're gonna get into like online reviews and how AI is changing that it's going to be just a really, really fun episode today. But tell us a little bit like How is it going from like full blown dentist in the chair to now full blown events, like running these awesome events that we're super excited to be a part. Just kind of give me a little insight to that. Dr. Len Tau (01:46) Well, it's been, it's been a lot of, a lot of fun. It's been very different, obviously, you know, for 23 years, I practiced dentistry, um, for about 12 of those, 13 of those who was full time. And then I went part time in 2017 until I sold and retired in 2022. Um, but one of the things I've grew up on in dentistry was going to dental events and, the big, the ones, the small ones, you know, all over the country and as a dentist first, and then as a vendor. Kiera Dent (02:08) Mm-hmm. Dr. Len Tau (02:15) Um, since 2013 or 14, so a long time in the space. know, one of the things that really hit me was that the events are not really put on very well. They're not, um, you know, whether you, if you're a dentist, there's issues when you're a vendor, there's issues. And I said, you know what? I want to change the game. And, um, one of my goals when I retired from dentistry was to start putting on events. So in 2023, um, in, in September, we did an event in Delray had 208. Kiera Dent (02:25) Right. Dr. Len Tau (02:44) Dennis there, 33 sponsors. was, first day was business, second day was marketing. Excuse me. First day was marketing. Second day was business. Had a 13, 14 speakers. It went off better than I could ever imagined. I then moved to the last year in 2024 to Scottsdale. And we were at the Scott Resort and Spa, which is a beautiful hotel and the event was good. It wasn't great. Definitely moving to different coasts. I felt there was not as much, know, engagement, excitement about the event. So I, my family and my wife and I decided, Hey, we're going to do this. Let's have people come down to me. I live in beautiful, you know, part of Florida. we're having this year's event and the next three of them at super at, ⁓ at pure 66, a brand new hotel, ⁓ in Fort Lauderdale. It's literally half hour from my house, five miles from the airport, easy to get to. So this year event is September 26th and 27th. Kiera Dent (03:32) Bye. you No. Mm-hmm. Dr. Len Tau (03:45) We've got 14 speakers, ⁓ mixture of business and marketing. So we've got people talking about social media, about content. We have people talking about saving money on taxes. We're talking about how to become a fee-for-service practice. ⁓ So a lot of different great content and top speakers, Steve Rasner, Paul Goodman, ⁓ Jeff Buski, ⁓ Richard, Rich Maddow. So some real, real heavy hitters. And then some people who people haven't really heard of, Melanie Diesel, who's new in the dental industry. So, but I like to do it differently and my events are very high end. You come, you're going to see things you probably have never seen before. I give a ton of time to the vendors so the vendors love me because they make sure that they get integration or interaction with the attendees. So you're going to be speaking in 2026, same weekend at September 25th and 26th in 2026, same hotel, pure 66. Kiera Dent (04:28) Sure. Yeah. Dr. Len Tau (04:40) We're ramping things up right now. We're literally a month out from the events. still have people signing up. I still have people wanting to reach out as sponsors. And it's, it's, it's, is the fun time for me. Cause when I'm done, I'm, you know, I get a couple of months of break and then I start promoting 2027 again. So ⁓ it's been a good time. I really enjoy it. And I find that I've kind of ⁓ created something that's very different and the attendees really enjoy it and the vendors really enjoy it. So if I can make everybody happy, Kiera Dent (04:45) No. Dr. Len Tau (05:09) That's all I'm looking to do here. Kiera Dent (05:11) ⁓ and Len, I hope the audience, if they can't see it, they can hear it. I think it's so fun because I mean, I've seen you in different spaces in your career, in your life. And there is just this like giddy, like younger version of Len that I feel is emerging of like, it's like giddy boyhood, like excitement of I'm excited to put these on. I'm excited to do these events. And it just makes me so happy for you. And what I think I'm hearing is yes, attendees are happy. Vendors are happy. But I also hear that Len is very happy and to do something in dentistry is just very, very fun. It's very exciting. And so we're jazzed. I'm really excited. I love good events. I love great time. I love to help. love business marketing. Everybody can take that. That's not Kiera's jam. Like I, that's why I wanted to bring you on. You guys are very good at marketing. You're very good at that space. but to talk about how to help people have their best lives to grow the practices that they want to grow. I think you and I are so synergistic in that. So we're super excited and I love, I mean, I'm not going to highlight the fact that there were a couple of sixes in that is September 26 at Pier 66. You guys hopefully like, I like the alliteration. Don't put anything weird on it guys, but I do appreciate that you made it easier. September six and nines flipped upside down are a six. Like hopefully everybody can remember September 26, Pier 66. It'll be a good time in 2026. I mean, we got four lines, so we're okay. We've at least got four sixes, not we didn't end on three, but I really hope an exciting step. We'll make sure we put some info for people. For this year and for next year, I think it'll be a fun time. Dental A Team will be there, so come hang out with us. ⁓ Len, I'm super excited. I will not spoil secrets, but a lot of the things he told me for the events, I will say he does put his heart and soul into it. So Len, excited about that. Thank you for sharing. Good luck for this year. We're gonna be rooting you on this year and next year. And now let's pivot. Let's go into like your jam. You're in BirdEye, you're in marketing, you're on online reviews. AI has come into the scene. Practices are changing. I also will say, I hope everybody listens to you of their like succession story. You hung up the hand piece, but you are still full steam ahead in dentistry. And so I hope people see that like there is no path to dentistry. Like you just, it's a, it's a beautiful world that you're in. So let's talk though, online reviews, AI, how is this working? How do we make sure that practices are still being visible? Chat GPT is on, on the prowl. There are clients signing up with us now that have found us on chat. GPT, which is so random. It's changing how people have been doing things. Walk me through. What are you seeing with these online reviews? The importance, how to bring AI in? Like, let's just kind of go in a rift on how practices can still be visible with AIs. Like just showing up to the scene. Dr. Len Tau (07:43) So I wanna talk about chat CPT for a second. ⁓ I refer to it as my best friend. ⁓ It helps me edit. No, I haven't named it yet. No, I haven't named it. ⁓ you have? Kiera Dent (07:50) Have you named it? I've got to just ask Len. Have you named? I have! Me and Chet, I had a name and now her name is Wanda. I don't know why, I don't even know where Wanda came, but people are like, here, are you hanging out with Wanda again? Cause I agree. Like they're our best friends. So go on Len. I can't wait to hear what you name your Chet GPT cause mine is currently Wanda. Dr. Len Tau (08:06) I'll have to, I have to name it now that I have to think of something. ⁓ but no, I started using it. I'm like, this is really helpful and it's only gotten better. And, just to give you an idea is, ⁓ my wife and I, and my son, my son just graduated high school. He's literally just started his freshman year at, university of Florida on a free ride. ⁓ smart, smart ass kid. I'm very proud of him. But, you know, and I travel a ton, but I travel a ton for business and I made a commitment. I think I told you that, Kiera Dent (08:25) Boo? Yeah. Dr. Len Tau (08:35) during the summer when he was going away for school, I was not going to travel. So from March to literally next week, beginning of September, I haven't traveled at all for business. we did plan some really great travel for our personal lives. And one of the things we did was we had a cruise, a 17 day cruise to Europe. ⁓ And when I decided I did not want to do the excursions to the cruise, cause they're really expensive and you're with all these people. I prefer to kind of just go and tour myself. Kiera Dent (08:44) It's awesome. Dr. Len Tau (09:05) So I use ChatGPT in every city. And I said, I'm going to the city. This is what I'm going to get in. This is the cruise I'm going on. It got the cruise itinerary. And I said, I want to set up private tours in every city with different people. And it helped me pick the best tour guides. They referred me to a website called Tours by Local, which is an amazing website that you can meet people who are local that will take you around. show you the city and it was amazing. It was amazing. So I thank Chachi PT for doing that because I wouldn't have known about half these things if I didn't do it. And in fact, one of the women, and actually the very first place you went to, which was in Split, Croatia, which was beautiful. I told her that literally that's kind of how I went down this road was I asked Chachi PT, what should I do in Split? And they said, you need to use this tour guide. She's the highest rated tour guide and has the best reviews on tours by local. like, What's towards by local? And that started this whole thing. So she was, she was amazed to hear that. So, ⁓ I have been using Chad GPT for a long time, like I said, and even now it is people I know type in, know, get me to the best dentists in the area. And it's very much based on reviews. So you have to be a highly rated practice. you may not believe in reviews and if you do, think you're not smart, but you know, if, if you want to be at the forefront of where people are looking, Kiera Dent (09:58) Yeah. Yes. Dr. Len Tau (10:25) You have to generate reviews in a significant amount. Velocity now, which is how often you're getting them, is one of the biggest ranking factors on Google, whether you want to believe chat GPT or not. ⁓ But you have to get reviews. You can't, you know, rest on your laurels and say, well, I have enough because you never have enough. Okay. And, ⁓ and you've got to let Google rank you high. And there's been a big discrepancy in the industry, a big, I don't want to say a misunderstanding. Kiera Dent (10:43) read. Dr. Len Tau (10:52) But I've been in the review space now since 2013, so 12 years. And in the past, dentists thought that if they get reviews, they're going to rank. And that's not the way it is anymore. If you have reviews, but don't pay attention to the other ranking factors, you actually don't rank well. And that's a problem. So, chat GPT AI is so important, but you still got to dominate Google. You still got to get to the top of the pages. And that's really where the direction is going. and if you aren't there now and you are ignoring it, you're never going to get there. So I would love to talk to you about our list in instruct or educate the listeners and viewers of these ranking factors that they need, need to pay attention to, or they're going to be left behind when it comes to ranking on Google. Kiera Dent (11:27) Yeah. absolutely. And I'm excited for this too, because, I did notice that you've got to like, AI is just crawling the web. That's where it's getting, it's being taught. It's crawls it. It looks through all of it. And so agreed with you. have a lot of clients and like, we want the secret pill of marketing. And I might get your reviews up. Like it is constant and consistent that if you get those reviews up and you bring pieces to the table, that people literally like that's what's going to rank you higher. So I'm excited, Len to, to dig in deeper because it is like how getting more reviews, but to hear that there's more beyond just the reviews really can help these offices like get the best bang for their buck, help more practices. And I'm like, it used to be when I first started consulting when I used to tell offices get to like 100 Google reviews. It is now I'm pushing people like five, six, 700 reviews that you need to be getting ranked into. And I don't know if you're seeing like a cutoff line or if it matters on that. So I'm really excited to dive into like, what are the rankings? What are the pieces? Is there a difference? But I'm like now 100 reviews, when I look at somebody I'm like, hmm, like if there's another dental practice that has maybe 400, 500 new clients come on, the first thing I do is I go look them up to see how many reviews do you have? And I'm shocked at how many dental practices actually are not showing up when I Google their names and they're like, no, no, care, we're here. And I'm like, but if I'm a prospective new client that doesn't work in your practice and I don't see you all the time and I just tried to find you and I'm looking for you. How many patients who are not looking for you are not finding you as well. So yeah, take us away, and I'm super curious, very intrigued by this. It's fascinating. And I'll also say, because AI is new, feel like people got like a reset slate. Like, hey, you can actually get back into the game if you haven't been into the game, if you just start playing now. If you don't, I agree with you. I do think that you will unfortunately get obliterated without trying if you don't get into the game now. Dr. Len Tau (13:28) 100 % so and I couldn't agree with you more. So the best thing to do here is if you're listening to this, I want you to go to a Google search and I want you to type your practice name in. Okay, so that's the first thing to do. Right. Kiera Dent (13:39) and not in your office. Don't do it in your office. Go somewhere else. Like try it somewhere else. Dr. Len Tau (13:44) Right, well, and 100%, that's another thing is that if you're gonna look up your ranking specifically, you do not wanna do that from your office location, okay? Because you're not gonna get real results. You also wanna go into incognito mode or private browsing mode on your phone or your computer if you're doing that to check ranking. But this is not specifically about ranking. This is more about how you appear online. So go to Google and type in your practice name. Not your name unless it's the name of the practice, but your business name, okay? Kiera Dent (13:52) Yes. Mm-hmm. Dr. Len Tau (14:13) and it doesn't have to be what's registered with the state board. It's how you, when you answer the phone, what you say, okay? Pennsylvania Center for Dental Excellence was my practice name, okay? So you wanna look yourself up. So these are some of the ranking factors that Google looks at. Obviously one of them is your total number of reviews you have. Definitely a ranking factor, but the total number has not been as important as some other factors as well. So. Kiera Dent (14:20) Mm-hmm. Dr. Len Tau (14:40) Average number of reviews in the industry right now is about 350. It used to be like 100 was the golden number. Now 350 is the average in the industry. So are you average? Are you below average or are above average? Okay, that's something to look at. The second ranking factor, which is even more important is the velocity of reviews. So how many reviews, how often you're getting them. Okay, so if you're getting once every two weeks, not enough. If you're getting them once every week, Kiera Dent (14:46) Yes. Dr. Len Tau (15:10) Not enough. You don't need them every single day, but two or three every single week is ideal. Okay, because you think two or three every week gives you eight to 15 a month times 12 months is 100 reviews a year, which is a nice number. Okay, so you have to have that velocity. All right. Third ranking factor is the total score, your average number of stars. So I would like you to be anywhere from 4.6 to five stars. Okay. I don't think you have to be only five stars. think there's a negativity related to that. If you're only five star reviews, but I also don't want you to below 4.5. Okay. ⁓ And if you're at 4.3, 4.2, or even 4.1, another better review or two, and you're to be in the threes. And that's really where you don't want to go. Cause you lose a huge percentage of patients who may come in if you're less than four stars. Okay. Another ranking factor. is the primary category. So how do you know your primary category? If you look under your Google, your name, will say right where the stars is, will say, hopefully dentist in your town or dentist in your county or dentist in your city. Okay. So your primary category should be dentist because we're a dental practice. Okay. If you're an oral surgeon, you may want it to be oral and actual facial surgeon. If you're an endodontist, want it to say endodontist. You don't want it to say dentist if you're a specialist. Okay. ⁓ That's a big ranking factor and I'll give you an example. I, ⁓ my wife had some plastic surgery over the last couple of years and we were referred to that doctor. So we didn't need to search for him. We were referred to him. went in, we liked him, we used his services. ⁓ And of course, being a plastic surgeon, I talked to him about reviews. He now uses BirdEye, but he had me speak in an event that he holds down here in Boca Raton. And I talked about this exactly. And I asked everybody, cause it was a small group. What is your primary category? And he goes, he said to me, literally, he says, I'm listed as a nurse practitioner. He wasn't listed as a plastic surgeon. He was listed as a nurse practitioner. So his categories were all messed up. So when you actually typed in plastic surgeon near me, he never showed up because his category was wrong. So primary category is a very important ranking factor as well. Now you also have to make sure your secondary categories are also. ⁓ Kiera Dent (17:15) No. Dr. Len Tau (17:35) ⁓ under ⁓ are there as well as under the proper categories. So secondary categories, if you're a dentist, dental clinic, teeth whitening services, denture care center, orthodontist, if you're doing aligners, if you're endo, you're doing root canals, you can have endodontist. If you do periodontist, can do periodontist. You want to make sure you have nine secondary categories. Okay, if you don't have them, you want to add them. Now, how do you add them? It's very easy. You go to Google using ChatGPT or anything and say, how do I add secondary categories to my Google business listing? Okay. It will tell you exactly like a recipe how to do it. You need to add those secondary categories. All right. And if you want help doing it, you can always reach out to me. The last ranking factor, which is really important is making sure that the practices name, address, and phone number is consistent. Okay. So just to be clear, most website companies do not do local SEO. They do website SEO, which is making sure the website is SEOed so the website ranks higher on the organic rankings. We're talking about getting the Google business page ranking higher, which the website companies are not focused on. So when it comes to the name, address and phone number, is it consistent? You have to be consistent. And this is a Google requirement. It is not a patient thing. It's not a me thing or you thing. It's a Google requirement that this data is consistent. So the name is obviously important. So if you have the and or the ampersand, you may find things inconsistent. When it comes to the address, if you have, you know, South State Streets, Unit 510, you can have South or S, you can have Street or ST, and then you can have Suite, Unit, Number, or STE. All these variations need to be consistent. So one of them has to be done and one and stuck with. And then if you are using a tracking number for whatever reason on your Google business listing, you may find your inconsistent there as well. So when you make everything consistent and you get a higher velocity of reviews, guess what happens over time? You rank higher on the maps. And when you rank higher on the maps, you get more visible for patients to find you. So that's where the secret sauce is. And Not that this is a sales pitch about BirdEye, but that's exactly what BirdEye does. BirdEye does those. We check all those boxes for you. And then what ends up happening is a practices get more reviews. But more importantly, when they ask patients how they find them, they're going to see that they found them because of their ranking online and the reviews drove them to the practice. So that's how this whole thing plays a role in getting a practice more visible and credible. Kiera Dent (20:06) Thank Wow. So I was over here like taking a lot of notes, which I really loved. I love the number, the 350 at the average, the velocity, like three to five per week you were saying. It doesn't need to be an everyday, but I do agree like them consistently coming through the total score, the 4.6 to five primary category, secondary category, making sure we have nine. And then you were talking about like the practice name, phone number, all of that has to be consistent. So the addresses have to be the same. And that's going to help you rank higher. Did I miss anything? Those are my notes, Len. And I'm just curious, like, did I catch them all? Because there was a lot of pieces to consider. And then I have some follow ups as well. So like, did I miss anything in that list? Dr. Len Tau (21:02) No, I think you got it all there. Kiera Dent (21:06) Okay, so hopefully that was a good recap for everybody. If you were listening, I tried to like summarize everything he said, because I really feel that those are super valuable pieces to know. Now, Len, there's a couple of things that happen and I'm very curious of what you've seen. Maybe you know, maybe you don't know. It's just a riff for me genuinely curious over here. Does it impact for the business to respond to the reviews? Because I know there was like a big misnomer out there like for a while, like you have to respond to every single review that helps you rank higher. What's the What's kind of the lay of the land right now responding to the reviews that come in? Dr. Len Tau (21:39) So there's been a big push over the years to respond to reviews. And there's also been those naysayers who don't want you to respond to reviews. So I want to make this very clear. When you respond to a review and you acknowledge them as a patient, you are technically violating HIPAA. Okay. Now by the letter of the law, if you do that, you violated HIPAA and can be in trouble. Now in all the years I've been doing this, I've only seen one Kiera Dent (21:49) Mm-hmm. Dr. Len Tau (22:08) example of a positive review being responded to and the dentist got in trouble. Okay. So if someone writes a review for you and it's five stars and you say, thank you so much for your feedback. We were glad you had a great experience in our practice. Okay. You technically violated HIPAA there because you acknowledged that they came into the practice. I don't think you'll ever run into any problems with that. I don't, I've never seen any instance when a, when a practice has got into trouble. But again, by the letter of the law, it's a violation. Here's where the person ran into a problem. Okay. So the review in question, the patient wrote, I'm so happy with my appearance after I went to so-and-so's dental office. I think they were in Texas. The dentist responded, we're so happy that you, thank you so much for your review. We're so happy that you loved our magic needles. Okay. So it, from what I understand is the patient had Botox or dermal fillers placed and that's what they call their magic needles. So the patient wrote, wrote a letter to the practice saying, I didn't appreciate you letting the world know that I had Botox done and asked for the review response to be taken down, which the dentist immediately did. Took it down and apologized, but it really pissed the patient off and the patient sued the dentist and won. Okay. Because the dentist went out of their way to Kiera Dent (23:08) Mm-hmm. Right. Dr. Len Tau (23:33) you know, release private information that wasn't supposed to be done. So in that case, you shouldn't be doing that. Okay. Now on the same note, I would be very careful responding. Kiera Dent (23:37) Mm-hmm. Dr. Len Tau (23:45) to a review that's left by a negative, a negative review that's written by a patient. I would be very careful responding publicly to that because it's very hard to respond without violating HIPAA. So a simple response like, we're sorry to hear about your experience. Please contact the office to discuss the concerns as we're unfortunately unable to comment due to HIPAA release privacy stuff. That's fine. But. Again, I just not sure it's the best thing to do. So you have to be careful with negative reviews. What it doesn't do is we really haven't found any relationship between responding and ranking. Okay, so you have to, I always leave it up to the people to respond. I like using AI to respond as well, because I think it comes up with HIPAA compliant and really good responses. ⁓ But you have to decide what you want to do for your own practice. Kiera Dent (24:16) Mm-hmm. Interesting. That's actually really helpful to know. ⁓ Okay, good feedback for people to ponder and decide what they want to do on. The second piece is some people lose their Google My Business and they're not able to be found. ⁓ And I don't know if you have reasons why. I don't know if it's from like a name change or it's inconsistent. So like a lot of offices have a lot of reviews, but when you go to search them, they're hidden on Google My Business. Like it will show up on the person's side, but nobody externally can find it. Do you have any ideas of like what causes that or what offices can do if they're struggling with that? Dr. Len Tau (25:11) So I want to clarify that what question you asked there. I'm sorry to ask a question when you asked the question was when you say that you're saying that when they search for their Google business listing, they can't find it or when someone is searching for the office, they're not visible on the maps. Kiera Dent (25:15) Hey, that's okay. So when they're searching, so if I just go into Google and I type in like my perfect smile, the website might link, but the Google My Business with all, and they might have like 150 Google reviews, like it might be, like they've got them all and the office can see it when they like log in as like, this is, you own this, but they've lost it and it's no longer visible publicly. Do you know what causes that or how they can get that back? It's okay if you don't, I'm just genuinely curious. Cause I know some offices struggle with this, especially with like name changes of practices. going through different ownerships. ⁓ Some of them have told me it's like when I changed the name of my practice, it no longer showed up. Like we have all these reviews, but we're not showing up. Do you know what causes that or how practices can get back being visible? Dr. Len Tau (26:02) Yep. Now that you asked it that way, so that usually means that your Google business listing has been suspended. And if you can't find it on search, but you see it, means it's suspended in most cases. Name changes, address changes, other things you do can cause it to be suspended. There are, if you look up on use chat GPT, ⁓ and say, why is, why can your Google business page be suspended? There is a list of different reasons why it can get suspended. ⁓ if you're getting reviews the wrong way is a big one. So, like you should not be incentivizing for reviews. And I'm talking about incentivizing the patients. You shouldn't be getting reviews in your physical office space because there's IP address conflicts and location services on the patient's phone. So if you're doing that, not only will you can potentially lose reviews, but you can't get it suspended, but you can look on. Kiera Dent (26:37) Mm-hmm. Dr. Len Tau (26:55) on chat GPT or Google and just say, what are the reasons that your business page can be suspended? And they're there. So usually you have to ⁓ re-approve it or re-verify that page. And there's certain things you do. You'll have to take a video of yourself in front of the practice, showing the address, showing the name of the business on the door. So there's things you will have to do to get it over to Google. So they'll re-verify you. And then once it happens, there's a good chance they'll unsuspend the listing. But that happens for that reason. Kiera Dent (27:24) Gotcha. Okay. That's super helpful because I know a few offices have struggled with that. So was just curious for that. All right. This has been so helpful to figure out rankings. It's been helpful to understand. ⁓ My last question as we wrap up today on reviews has been so helpful, Len, is how do offices go about like, what are your recommendations? Yes, bird eye, swell, podium. Like there's a lot of review in Weave. I do, I usually recommend using an external one outside of things. think that they like, if they're just, if that's what they do, they're going to be experts at it. But how can offices ethically and appropriately, like obviously great patient experience, but how do they increase these Google reviews? What are some of the best tactics you've seen to help these offices out? Dr. Len Tau (28:04) So being biased, I mean, I'm a true believer in BirdEye because we help with the reviews and the ranking part. ⁓ Swell, which is a great product. know the guys who swell really well. A lot of their doctors don't rank well because they don't focus on the listings part of it or the ranking part of it. ⁓ I'm not a fan of Wee from a review perspective because they swell BirdEye and Podium, make it very easy. Weave doesn't. It's just the way we do it with our three other products. ⁓ I always say this, you can get reviews any way you want. The most effective is gonna be use some software, simple as that. But it all starts with the practice and it all starts with, I like to create a reputation culture in the practice, which means you know that every time a patient comes in the practice, that they're going to be evaluating you and reviewing you potentially. And you've gotta be on your best behavior, you've gotta put a happy smile on your face, you gotta treat them like they're the... Kiera Dent (28:40) Mm-hmm. Dr. Len Tau (29:00) king of the world, okay? You gotta roll out the red carpet. And if you don't do that, they may write a bad review, okay? But if you don't create that reputation culture, I think it's gonna be hard to get the practice to really accelerate the reviews. So creating that reputation culture using great verbiage skills. I love calling it feedback, not a review. If you call it a review, it sounds like you're begging for it. ⁓ The feedback conversation is much more comfortable to have. So, you know, it's an interesting situation, but if you don't ask, you don't get. So you've got to ask. I think if you ask and you combine it with a really good software, you'll get a really good number of reviews. If you don't ask, you don't get. It's that simple. Kiera Dent (29:30) Mm-hmm. Yeah. ⁓ well, that was so great. I appreciate this so much. And it's fun to hear about how AI is helping. It's fun to hear about how you still have to be great on Google. So ⁓ I just appreciate you. I appreciate you being here. I appreciate the knowledge you shared. appreciate for offices. I hope they take action and Len any last thoughts, how people can connect with you if they want more help on this. know ⁓ like truly in my opinion, this is the simplest marketing. Everybody wants to like sexy magic pill of marketing. And I'm like, no, it's like really great experience. Ask for the reviews, ask for the feedback. like rank so that way people can find you I've had offices that had like three four or five new patients and they're like I need this marketing I need all these things which I'm not here to say not to do it but I will say great reviews will boost you very quickly so Len any last thoughts you've got how people can connect with you because it's been truly just an incredible episode today Dr. Len Tau (30:26) So ⁓ I'm around the country a lot. So you can always connect with me in person if I'm at some of these events. If you wanna come to Supercharge, you can connect me there. SuperchargeYourDentalPractice.com You can use the code RAVING to save $100 on registration. ⁓ We also have some scholarships available. So if you do wanna come, you can reach out to me personally. So ⁓ my cell phone's all over the internet. The easiest way, if you have any questions, you want advice, you want help, I'm the guy to reach out to. My phone number is 215. Kiera Dent (30:40) Awesome. Dr. Len Tau (30:55) 292-2100. And my best email is Len, L-E-N, at drlentau.com, which is D-R-L-E-N-T-A-U.com. And you can email me, you can text me, you can call me, tell me you heard about me here and you need some advice. I'm more than happy to offer it to you. I do it all the time. ⁓ I love when people reach out to me because they know I'm an expert. So I do it kind of as a favor to people. ⁓ But no, you reach out to me, I'm happy to give advice. Kiera Dent (31:23) amazing. Len, thank you so much for being on the podcast. I'm super excited for Supercharge 2025 and especially 2026. So everybody snag that. And truly, I hope you take action from today's podcast. This is easy ways for you to boost your marketing, be found and seen online. And Len, thank you for joining me today. I truly, truly appreciate you. Dr. Len Tau (31:41) Thank you for having me, Kiera, I appreciate it. Kiera Dent (31:43) Of course. And for all of you listening, thank you for listening and I'll catch you next time on the Dental A Team Podcast.
The Independent Characters - A Warhammer 40k Podcast | Radio
Episode 264 of The Independent Characters is one for the ages! This time, we're not just joined by guests, we're joined by legends. That's right: the entire cast of Life After the Cover Save joins Carl in the Astronomicon! Ed, Blake, and Travis bring their unique brand of humor, hobby passion, and podcast chaos to join Carl and crew for one big crossover episode you won't want to miss. The topic? One that's bound to spark some debate... What are the greatest Warhammer 40,000 expansion supplements of all time? From the game-changing codex expansions that shaped entire editions, to legendary campaign books and wild narrative add-ons that still echo in hobby history - we're digging deep, reminiscing, and arguing about which supplements truly stood the test of time. You'll get laughs, you'll get hot takes, and you might even find yourself dusting off some old tomes from your shelf once we're done. Because let's face it, everyone has their own “best of all time” picks, and we want to hear yours too! This one's a celebration of the game, the supplements that shaped it, and the community that keeps it alive. Time Stamps: 0:00:00 - Show Intro, Elite Choice, Hobby Progress 0:34:00 - The Greatest 40k Supplements - Part 1 1:38:00 - The Greatest 40k Supplements - Part 2 2:17:00 - Final Thoughts and show closing Relevant Links: The Independent Characters Patreon Tablewar! - SPONSOR Herrick Games & Hobbies - SPONSOR Goonhammer War on The Shore 2026 Adepticon The Beard Bunker Games Workshop The Black Library
To have a strong heart, you naturally need strong arteries. And that’s not a problem for Antares, the heart of the scorpion. It’s flanked by two fairly bright stars that historically have shared a name: Alniyat – an Arabic name that means “the arteries.” The stars probably are siblings of Antares. They all formed from the same giant complex of gas and dust, within the past 10 million years or so. Alniyat I is also known as Sigma Scorpii. It’s a system of four stars. Two of them form a tight pair, with a third close by. The fourth star is farther out. Both stars in the tight grouping are much like Antares. They’re many times the mass of the Sun, so they’ll probably end their lives with titanic explosions. Antares is a little farther along its lifecycle, so it’s closer to that showy demise. Alniyat II is Tau Scorpii. It’s a single star. It, too, is destined to explode as a supernova, but not for several million years – a little later than Antares and the main star of Sigma. On the astronomical clock, though, that’s close – just a few ticks away. Antares and its arteries are close to the right of the Moon at nightfall this evening. Sigma is close to the right or upper right of Antares. Tau is about the same distance to the lower left of Antares. The arteries aren’t as bright as the scorpion’s heart, though, so you might need binoculars to see them through the glare. Script by Damond Benningfield
The Independent Characters - A Warhammer 40k Podcast | Radio
In Episode 263 of The Independent Characters, we open the doors once again to The Warrior Lodge, our roundtable-style discussion format where we bring together voices from across the Warhammer 40,000 community. This episode, Carl is joined by Josh, Adan, and Skye as we dive into some of the most pressing topics currently sparking debate and reflection within the hobby. Whether you're a veteran player or newly initiated into the grim darkness of the far future, The Warrior Lodge always has a seat at the table for you. We hope you enjoy this episode of The Independent Characters! Time Stamps: 0:00:00 – Show Intro, Elite Choice, Hobby Progress 1:08:45 – The Warrior Lodge: Part 1 2:00:45 – The Warrior Lodge: Part 2 2:34:30 – Final Thoughts and show closing Relevant Links: The Independent Characters Patreon Tablewar! – SPONSOR Herrick Games & Hobbies – SPONSOR Imperium Maledictum by Cubicle7 Goonhammer War on The Shore 2026 Adepticon Games Workshop The Black Library Gamers Grass Bases
A Note from James:I was honored to be on the Smart Humans Podcast. I'm a big fan of the show, and I was happy they asked me on—especially since I got to talk about things I don't usually cover here.We discussed my specific predictions for investments like Bitcoin, Ethereum, Tau, and stablecoin-related tokens like Curve and Aave. We also explored AI's role across industries, habits that have helped me build and sell companies, and bad habits that cost me millions.This episode is packed with my thoughts on investing, crypto, AI, and the lessons from going broke—twice. I hope you enjoy this conversation as much as I did.Episode Description:In this special crossover episode, James Altucher joins Slava Rubin on the Smart Humans Podcast to talk investing, entrepreneurship, and the habits that make (and lose) fortunes. James shares his journey from early internet entrepreneur to hedge fund manager, bestselling author, and crypto investor—and the times he lost it all along the way.They dive into today's hottest investment themes, including Bitcoin, Ethereum, Tau, stablecoins, and the intersection of crypto and AI. James also explains why he avoids bonds and real estate, the power of his “10 ideas a day” practice, and the economic trends he's watching over the next three years.What You'll Learn:Why James predicts Bitcoin could reach $250K by next year and $1M by 2027The case for Ethereum, Tau, Curve, and Aave in the next wave of crypto growthHow AI is transforming productivity—and why that's bullish for the economyThe “10 ideas a day” method for rebuilding creativity and opportunityWhy avoiding certain asset classes can be as important as picking winnersTimestamped Chapters:[00:00] A Note from James: Why This Episode Is Different[02:00] James's Journey: From Web Pioneer to Investor[06:00] Selling for $15M and Losing It All[10:00] Rebuilding Through “10 Ideas a Day”[14:00] Private Investing and Early-Stage Bets[17:00] The Crypto–Equity Crossover Trend[21:00] Why James Avoids Bonds and Real Estate[23:00] Stablecoins as the Biggest Use Case for Crypto[27:00] Picks and Shovels: Curve and Aave[31:00] Ethereum's Potential vs. Bitcoin[34:00] Economic Outlook: AI, Productivity, and Growth[39:00] Risks, Inflation, and the Money Supply[42:00] Tau: The Decentralized AI Play[44:00] Doing > Reading for Real Expertise[46:00] Three-Year Predictions: Public and Private PicksAdditional Resources:Smart Humans Podcast with Slava Rubin: WebsiteJames Altucher on Twitter: @jaltucherChoose Yourself by James Altucher – AmazonCurve Finance (CRV) – curve.fiAave (AAVE) – aave.comEthereum (ETH) – ethereum.orgToday's Advertisers:Head to rugiet.com/JAMES and use code JAMES to get 15% off today!Secure your online data TODAY by visiting ExpressVPN.com/ALTUCHERElevate your workspace with UPLIFT Desk. Go to https://upliftdesk.com/james for a special offer exclusive to our audience.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
The Independent Characters - A Warhammer 40k Podcast | Radio
In Episode 261 we're thrilled to welcome back our longtime cohost and friend, Adan Tejada, to The Independent Characters! Adan joins Carl for a wide-ranging discussion that starts where many games and campaigns end - exploring the tools, techniques, and traditions that help preserve the epic narratives forged on the tabletop. From post-campaign surveys to lore write-ups, and archiving your group or clubs shared mythology, we look at how to give your games the grand epilogue they deserve. If your Warhammer stories stop once the dice go cold, you're missing one of the best parts. But that's not all—we also sit down with Adan for a candid conversation about his recent 3.5-year stint working for Games Workshop. From staffing one of Southern California's flagship Warhammer stores to managing a single-staffer shop, Adan shares insights on what it's really like behind the counter. What works, what doesn't, and how Games Workshop's retail model functions on the ground floor all get covered in this honest and informative segment. Whether you're building out your games epilogues or just curious about what goes on in your local Warhammer store, this episode offers something for every hobbyist who loves the game behind the game. It's a look at the stories we tell after the battles are over—and the people who help keep the hobby alive in the real world. Time Stamps: 0:00:00 - Show Intro, Elite Choice, Hobby Progress 1:32:25 - After Action Reports: The story after the battle 2:21:15 - Working for GW: Tales from Adan 3:10:45 - Final Thoughts and show closing Relevant Links: The Independent Characters Patreon Tablewar! - SPONSOR Herrick Games & Hobbies - SPONSOR Goonhammer War on The Shore 2026 Adepticon The Beard Bunker Games Workshop The Black Library
From produce as medicine to banana trade trends, we're breaking down the biggest stories shaking up fresh produce:IFPA's National Health Campaign – Can fruit and veg become part of your healthcare plan?Global Heatwaves – What extreme weather means for your supply chain.Pest Victory – California eradicates the Tau fruit fly in Orange County.Tech Meets Policy – Cold chain innovation meets produce prescriptions.Banana Markets – Why prices are swinging and trade lanes are shifting.Whether you're in the field, on the dock, or behind the desk—this episode connects policy, climate, innovation, and global trade in under 30 minutes