POPULARITY
Categories
In this episode of Communicable, Emily McDonald and Josh Davis are joined by Roger Lewis (USA) and Ian Marschner (Australia) to compare and contrast Bayesian and frequentist statistical approaches. The panel discusses the fundamental principles of both methods, common misconceptions, and the extent to which they are often more similar than many realise. Together, they explore their use in clinical trial design, analysis, and reporting, including adaptive trials and sequential learning. Additional topics include sample size misconceptions, regulatory versus clinical thresholds, and the challenges of interpreting post hoc reanalyses of negative trials.This episode was edited by Kathryn Hostettler and the executive producer of Communicable is Angela Huttner. Further reading:Berry SM, et al. Bayesian Adaptive Methods for Clinical Trials (Chapman & Hall/CRC Biostatistics Series). Boca Raton (FL): CRC Press; 2010. FDA Guidance Document: Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products FDA, 2026, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-bayesian-methodology-clinical-trials-drug-and-biological-productsLee TC, et al. Contextualizing the use of corticosteroids in severe Pneumocystis jirovecii pneumonia through a Bayesian lens. CMI Comms 2025, https://www.cmi-comms.org/article/S2950-5909(25)00082-4/fulltextLivingston EH and Lewis RJ. JAMA Guide to Statistics and Methods, https://jamaevidence.mhmedical.com/Book.aspx?bookId=2742Marschner I. Confidence distributions for treatment effects in clinical trials: Posteriors without priors. Stat Med 2024, doi: 10.1002/sim.10000.Whitehead J. The design and analysis of sequential clinical trials. Revised 2nd ed. Chichester: John Wiley & Sons; 1997.
Guests Akash Kulgod and Dr. Sanjeev Kulgod and host Dr. Davide Soldato discuss JCO article, "Canine Olfaction Combined with Bayesian Modeling for Multi Cancer Detection from Breath Samples, a Phase 2 Study in India" and the innovative breath-based canine olfaction for multi-cancer detection in low-resource settings, the Bayesian modeling integration, and future prospects for scalable, non-invasive cancer screening methods. LINK TO FULL TRANSCRIPT
Send us Fan MailMore information does not produce better decisions. This episode of Thinking 2 Think makes the case that data overload -- not data scarcity -- is the real leadership crisis of 2026. Executive Director and author M.A. Aponte draws on his experience in charter school leadership, Wall Street, and law enforcement to break down exactly how cognitive bias corrupts data interpretation and what the most effective leaders do differently when the signals are unclear. What You Will Learn:• The critical difference between signal vs. noise in organizational data• Why confirmation bias, availability bias, anchoring bias, and overconfidence are the four most dangerous cognitive biases in leadership decision making• What Bayesian thinking actually means for leaders -- without the statistics• How to apply the Three-Gate Signal Filter before drawing any conclusion from ambiguous data• A real case study of an organization that confused noise for signal -- and built a strategic plan around the wrong conclusion Q&A: What This Episode AnswersQ: What is the difference between signal and noise in leadership data?A: Signal is data that meaningfully changes a decision. Noise is everything else. The same data point can be signal through one lens and noise through another -- depending entirely on the decision you are trying to make. Most leaders skip defining the decision first. That is how they end up treating noise like signal. Q: How do cognitive biases affect leadership decisions?A: Four biases are most damaging: Confirmation bias leads you to favor data that confirms what you already believe. Availability bias overweights recent, vivid events over slow-building trends. Anchoring bias locks you to the first number you see. Overconfidence bias makes leaders express ninety percent certainty on sixty-five percent evidence. Each of these is documented, measurable, and correctable -- but only if you know which one is running. Q: What is Bayesian thinking for leaders?A: Bayesian thinking means your confidence in any conclusion should be proportional to the quality and quantity of your evidence -- and should update continuously as new evidence arrives. In practice, it means defining in advance what would cause you to change your mind. That single discipline protects against confirmation bias after the fact. The Three-Gate Signal Filter (from this episode):• Gate One: What specific decision does this data inform?• Gate Two: What is the base rate -- what would I expect without any intervention?• Gate Three: What evidence would cause me to revise this conclusion? Resources and Related Episodes:• Subscribe to The Logical Mind newsletter at maaponte.substack.com• Thinking 2 Think podcast: pod.link/1531984919• Companion Substack post: The Three-Gate Signal Filter ExplainedSupport the showJoin My Substack for more content: maaponte.substack.comConsulting/Advisory Services: MAAponte.comProfessional LinkedIn Page: www.linkedin.com/in/maaponteFinancial Budget/Wealth Management app (FREE): https://centsora.com/CHECK OUT OUR NEW CRITICAL THINKING GAME APP! Currently in BETA: Android: https://play.google.com/store/apps/details?id=com.base692af669b00f0dc8d8ad6653.appWeb: https://play.google.com/apps/testing/com.base692af669b00f0dc8d8ad6653.app*Coming soon to Apple Store
Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: What is a Bayesian occupancy model and what problem does it solve?A: An occupancy model accounts for the fact that you don't always detect a species when surveying for it, especially when the species is rare. A naive count of where you found it underestimates true occupancy. The model adds a repeated-measures component: you visit each site multiple times, and from the pattern of detections vs. non-detections it estimates a detection probability. Matthijs framed it as a zero-inflation structure where the zero-inflation happens at the site level rather than the observation level -- which keeps the model conceptually simple, just a standard GLM with a Bernoulli “is the species here at all?” stacked on top of a detection-rate process.Q: What are Automated Recording Units and why don't traditional occupancy models handle them well?A: ARUs are camera traps and acoustic monitors that record continuously over deployment periods of days, weeks, or months. The data they produce isn't a sequence of discrete human-led surveys; it's a continuous-time observation stream. Traditional occupancy models were designed for the discrete case -- a human visits a site, records yes or no, goes home. With ARUs, the question becomes how to bin or threshold the continuous data without losing the richer signal it actually contains.Q: When should you not reach for occARU?A: When your dataset is large and your survey interval is fine-grained. The bottleneck is Stan's fitting speed -- years of daily count data across many sites will fit slowly. The workaround is to bin coarser (weekly or monthly), which doesn't hurt occupancy estimation at all and only loses some detection-rate resolution. If you're only interested in occupancy, big grouping windows are fine.Full takeaways hereChapters:00:12:14 What is an occupancy model and what problem does it solve?00:16:16 What are Automated Recording Units and why do they need different models?00:18:45 What is the occARU R package and why does it exist?00:23:55 Why does occARU model counts directly rather than binary detection?00:26:38 What does multi-species hierarchical modeling with Gaussian processes look like?00:32:22 How does occARU implement Gaussian processes efficiently?00:41:01 Why are Gaussian processes such a powerful but tricky modeling tool?00:44:11 What is variance decomposition with global-local shrinkage priors?00:49:02 How does occARU leverage recent Stan features for zero-sum constraints?00:57:37 When does within-chain parallelization actually help?01:01:30 How does Monte Carlo integration reduce high Pareto-k values?01:15:27 When does occARU underperform and what's on the roadmap?Thank you to my Patrons for making this episode possible!Links from the show here.
The Experience Strategy Podcast Hosts: Aransas Savas, Dave Norton, Joe Pine Featured articles: "Death of the Segment: Why Personas Are Killing Personalization" — SwiftERM "Your Personas Are Outdated. It's Time to Evolve Your Approach." — Audrey Chee-Read, Principal Analyst, Forrester Every other post on LinkedIn is announcing the death of something. Most of it is alarmist storytelling dressed up as insight. But under the noise, two recent articles — one from SwiftERM, one from Forrester — are pointing at a real problem: personas and segmentation, built for an earlier era of marketing, have become a drag on personalization in the era of AI. Dave, Joe, and Aransas trace where personas actually came from, why they got merged with segmentation, what AI changes about the math, and what should replace the persona as the stable determinant companies are still looking for. The answer Dave keeps returning to: situations. Key Ideas Personas were never built for marketers. Dave opens with the history. The persona originated around 1999–2001 as a design thinking technique to get engineers to think more like customers. It worked. Then it migrated into marketing and merged with segmentation, and the original purpose got lost. Segmentation is the search for a stable determinant. Companies need something they can count on to define a market — geography, demographics, lifestyle, generation. Stable determinants make markets identifiable, and identifiable markets are countable. But the stability is increasingly fictional. Customers are not stable. They want different things at different times. Joe's arc: mass market → segments → niches → markets of one → markets within one. Joe walks the progression from Henry Ford's mass market through Alfred Sloan's segments through the minivan that opened up niche thinking. Stan Davis's Future Perfect (1987) saw the path to markets of one. What comes next is the flip: multiple markets inside every customer. Joe on a business trip is a different market than Joe on a leisure trip with his wife, even though it is the same person and the same credit card. This is the situational markets argument. Dave's frame: situations can be the new stable determinant. Friday night with your wife is a context. Monday morning before work is a context. Travel in cold Chicago is a different context than travel in France. The behavior changes with the context, even when the person does not. The SwiftERM line that lands the case. "While your team is busy building a persona for Sarah, the 35-year-old yoga enthusiast, Sarah has already moved on. She isn't a persona. She's a dynamic stream of intent." She bought a yoga mat six months ago. For the last three days, her behavior shows interest in high-end supplements and weightlifting gear. The persona missed the shift. The window of intent closed before the system caught up. Bayesian thinking is the right math for this. Predictive analytics has historically used past behavior to predict future behavior — yesterday you watched a romance, so tomorrow you will too. The newer move is using context, not just history. Yesterday you watched a romance because it was Friday and you were with your wife. The probability updates with every new piece of information. AI makes this practical at scale for the first time. The Apple Watch and Netflix examples make it concrete. The latest Apple Watch update no longer just serves up the workout you did last. It serves up the workout you usually do on that day of the week. Aransas lifts Monday and Wednesday and the watch knows. Netflix recommends romance on Friday night because the pattern holds across the whole user base. Restaurants have understood this for a hundred years — they do not serve breakfast at nine at night because they read the context. Customers have the same AI you do. Joe's reminder at the end is the one that should make every CMO uneasy. Customers can now vibecode their own shopping experience. They can customize as easily as you can customize for them, and they will configure it for their own context every time. The companies that win are the ones whose offerings can flex to the customer's situation, not the ones with the most polished persona deck. A Word on "Moments" Dave makes a careful distinction at the end. Moments is the right idea, but 20 years of design thinking have loaded the term with retail-moment-one, retail-moment-two, retail-moment-three thinking — discrete and product-out, not organic and customer-out. Situations carry the meaning without the baggage. Memorable Moments Joe: "I might be multiple personas, but you never say there's a person, they're that persona. That's just wrong — morally, much less business-wise." Joe: "Dave has yet to find a situation in which talking about situations does not work." Dave's bathroom study: weather changed bathroom usage at French gas stations. It did not move the needle at Chicago train stations. Different situational markets. Aransas on the Paris Marathon: one toilet, a hundred urinals, 20,000 runners — half of whom needed to sit. A persona designed for one imagined customer, and the actual situation ignored. Joe on the American Girl Place men's bathroom stocking products that men do not use — because the company actually thought about who was walking in with their daughter. The Strategic Takeaway Companies need something they can count on. Personas have stopped being that thing. Aggregated situations — Friday night, business travel with kids, post-workout, end-of-quarter — are stable enough to plan against and dynamic enough to respect what the customer actually wants in the moment. AI no longer makes one-to-one a scary thing to attempt. The excuse is gone. The companies that move now will be the ones the customer feels actually understands them. Subscribe and Continue the Conversation Find the show on the Experience Strategist Substack, the podcast feed, and everywhere else. Article links in the show notes.
Today's clip is from episode 158 featuring Stefan Radev. In this conversation, Alex and Stefan explore a genuinely fascinating problem: how do you turn an expert's intuition into a mathematically valid prior distribution - and can AI help automate that process?Alex explains that prior elicitation is essentially a translation problem. Experts don't walk around thinking in probability distributions - their knowledge lives in intuitions, rules of thumb, and rough ranges. The challenge is converting that into something a Bayesian model can actually use.The traditional approach? Ask an expert for quantiles or a mean, then parameterize your prior with hyperparameters and simulate until the model-implied quantities match what the expert described. If your pipeline is differentiable end-to-end, you use gradient descent. If not, you fall back to something like Bayesian optimization. Either way, you're iterating toward a prior that genuinely reflects expert knowledge - not just a convenient assumption.But the really exciting part is what came next. In a follow-up paper, they pushed this further: instead of optimizing within a fixed parametric family (say, a Gaussian), they replaced the prior entirely with a normalizing flow - a flexible generative network - and ran the same procedure. No assumed distribution family. Just let the data and the expert's knowledge shape the prior from scratch.The catch? More flexibility means more non-identifiability and stability headaches. But the direction is clear: a fully automated, end-to-end pipeline for building priors from non-probabilistic expert knowledge. And in 2026, that pipeline could theoretically be driven by an agent.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Send us Fan Mail*How do you forecast an event that has never happened before?*How do you forecast an event that has never happened before?The recent closure and reopening of the Strait of Hormuz are unique events. For events like these, traditional risk models lose their statistical basis: repetition. Alexander Denev returns to the podcast to show how causal models (Bayesian networks) let us reason about rare events despite this limitation.In this episode, we cover:- Why value-at-risk and other correlation-based models break exactly when you need them most- How a causal structure can "hold in time"- Building scenarios with LLMs - benefits, drawbacks, and lessons learned- Historical analogy as a modeling tool: Bosphorus, Hormuz, and more- A three-way robustness test for any Bayesian network- How the model's call held up: a ceasefire, a still-closed strait, and lasting infrastructure damage keeping oil elevated"History doesn't repeat itself, but it rhymes."------------------------------------------------------------------------------------------------------Video version available on the Youtube: https://youtu.be/FzKy2ws-7qsRecorded on May 29, 2026 in London, UK.------------------------------------------------------------------------------------------------------*About The Guest*Alexander Denev works at the intersection of quantitative finance, causality, and AI. He's the CEO of Turnleaf Analytics and the author of two books on applying Bayesian networks and probabilistic graphical models to finance and scenario analysis.Connect with Alexander:- Alexander on LinkedIn: https://www.linkedin.com/in/alexander-denev-66a25824/- Alexander's web page: https://turnleafanalytics.com/*About The Host*Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality (https://amzn.to/3QhsRz4 ).Connect with Alex:- Alex on the Internet: https://bit.ly/aleksander-molak*Links*Web- Alexander's LinkedIn post, Bayesian-network scenario for the Strait of Hormuz / Israel-Iran-US conflict: https://www.linkedin.com/posts/alexander-denev-66a25824_when-modelling-the-impact-of-events-that-share-7442892381668048896-JDs5/- Risk.net article, "Iran confusion makes the case for causal modelling": https://www.risk.net/our-take/7963361/iran-confusion-makes-the-case-for-causal-modellingBooks- Rebonato, R. & Denev, A. - Portfolio Management under Stress: A Bayesian-Net Approach to Coherent Asset Allocation (https://amzn.to/3vE6Jc1)- López de Prado, M. - Advances in Financial Machine Learning (https://amzn.to/3PXD8kH)- Molak, A. - Causal Inference and Discovery in Python (https://amzn.to/3VVK4m3)- Denev, A. - Probabilistic Graphical Models: A New Way of Thinking in Financial Modelling (https://amzn.to/3VQeLJm)- Pearl, J. & Mackenzie, D. - The Book of Why (recommended entry point) (https://amzn.to/4e0ATrZ)- Pearl, J. - Causality: Models, Reasoning and Inference (for advanced readers) (https://amzn.to/49zBKf5)- Rebonato, R. - Coherent Stress Testing: A Bayesian Approach to the Analysis of Financial Stress (https://amzn.to/3RC411e)*Perks & resources*
Yes, I mean whale privilege, aka fat women who demand attention. Lots of topics today, deadlifts, bayesian curls, wild meals and questions, solid rants. SUMMER SWOLE SPECIALS: https://summerswole.com
We sit down with Joshua Oommen to get nerdy about clinical reasoning, FDA standards, and why “good evidence” is harder to define than most of us admit. We challenge the reflex to trust p-values and meta-analyses, then test our instincts against real OBGYN examples where the literature has whiplashed practice. • why the podcast is called Thinking About OBGYN and how clinical reasoning shapes our work • the NEJM proposal to make one pivotal trial the FDA default and what “confirmatory evidence” might mean • medical reversal, surrogate endpoints, and how trust erodes when practice changes late • why Bayesian thinking fits how clinicians interpret tests, trials, and prior beliefs • how meta-analyses fail through small study effects, publication bias, p-hacking, and heterogeneity • the amnioinfusion comeback as a case study in applicability and overconfident conclusions Be sure to check out thinking about obgyn.com for more information and be sure to follow us on Instagram. 0:00 Welcome And Today's Big Question3:48 Why “Thinking About OBGYN” Exists11:54 The NEJM Push For One Trial16:38 Medical Reversal And Trust Problems24:43 AI Proteins And CRISPR Pressure Tests32:33 Bayes Thinking Beyond P Values36:43 Why Meta-Analyses Often Mislead41:08 Bias And Heterogeneity Red Flags46:24 Amnioinfusion And A Meta-Analysis Comeback1:02:29 Final Warnings And How To LearnFollow us on Instagram @thinkingaboutobgyn.
In this episode of the DASON Digest, DASON Clinical Pharmacist Liaison, Dr. Jeannette Bouchard, highlights key posters and clinical pearls from the MAD-ID and SIDP 2026 Annual Meeting, including oral antibiotic transitions, Bayesian vancomycin dosing, renal dose adjustments in septic shock, personalized antibiograms, and greener antimicrobial stewardship practices. For more information about DASON, please visit: https://dason.medicine.duke.edu/
Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: Why are prior predictive checks so underused in practice, and how do simulations help?A: They're underused because researchers don't always think to run them before seeing data -- but also because doing them rigorously (in the style Michael Betancourt advocates, with prior push-forward checks on interpretable summaries) takes effort. Simulations make it cheap to generate thousands of “what-if world” datasets from your model and check whether they look plausible, catching bad priors before you ever touch real data.Q: How can generative AI help with prior elicitation?A: Rather than forcing a domain expert to choose a distributional family and parameterize it, you can use a generative model to translate their qualitative knowledge directly into a prior. The expert describes what realistic data should look like; the generative model produces synthetic datasets matching that description; those datasets are used to fit a prior distribution. It removes the assumption that experts can think in terms of parameters and replaces it with the more natural question: does this look like your data?Q: What would a foundation model for Bayesian inference actually look like?A: Stefan's bet is that it won't be a fine-tuned general LLM. The right analogy is chess: you don't fine-tune GPT to play chess, you teach it when to call Stockfish. For Bayesian inference, you'd want a semantic layer – an LLM that understands the analysis goal – calling specialized numerical engines (MCMC samplers, amortized inference networks) that do the actual computation. Agent skills are already a step in this direction; the longer-term vision is engines that have been trained from scratch to generalize across large families of models and priors.Full takeaways here.Chapters:00:00 How does amortized inference fit into modern Bayesian workflows?06:01 What role do simulations play across the full Bayesian workflow?12:12 How do you elicit priors from a domain expert who doesn't think in distributions?19:01 What would a foundation model for Bayesian inference actually look like?35:32 What is self-consistency in amortized inference and why does it matter?39:22 How does semi-supervised learning improve simulation-based inference?43:16 Why is sensitivity analysis so important yet so underused in Bayesian practice?47:40 What is multiverse analysis and how does it change how we report Bayesian results?51:32 How does amortized inference make sensitivity and multiverse analysis affordable?01:02:47 How do amortized inference and classical MCMC complement each other?01:10:08 What are the next major directions for BayesFlow and amortized inference research?Thank you to my Patrons for making this episode possible!Links from the show here.
Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distinction matters.SPONSOR:---Cyber Fund built the Monastery to help founders ship products that were impossible a year ago. Applications for Batch 1 are now open.Apply now: https://cyber.fund---Jordan trained as a statistician and cognitive scientist, and his career has been spent building machine learning systems that work in the real world: supply chains, commerce, healthcare, and large economic systems. When the field rebranded itself as AI and then AGI, he did not follow. Instead he argues that the framing is wrong. AI is better understood as a collective economic system than as a race to build a disembodied superintelligence.We talk about why AGI is mostly a PR term, what machine learning achieved before the LLM hype cycle, and why the assistant-on-your-shoulder vision may be less compelling than it sounds. Jordan explains why explanations need to be actionable, not merely mechanistic; why AlphaFold's missing error bars matter; how prediction-powered inference changes the picture; and why drug discovery is an incentive-design problem rather than a pure pattern-matching problem.ERRATA: Science magazine ranked him the most influential computer scientist, not Nature---TIMESTAMPS:00:00:00 Cold open: A demoralizing message to young builders00:02:04 CyberFund sponsor read00:02:50 From symbolic AI to machine learning systems00:05:42 Why AGI is mostly a PR term00:08:48 A collectivist, economic perspective on AI00:11:33 Why LLMs need system design, not hype00:14:50 Predictability beats faux understanding00:17:55 AlphaFold, bias, and prediction-powered inference00:21:48 Stop anthropomorphizing intelligence00:27:44 Drug discovery as an incentive problem00:32:29 The three-layer data market00:38:07 Social knowledge, markets, and culture00:45:39 Creator economics beyond Spotify00:48:30 How science-fiction AI narratives mislead young builders00:51:45 AI should improve humans, not replace them00:56:42 Safety is a property of the whole system00:58:12 Silicon Valley gurus and the cream off the top01:00:47 Game theory, mechanism design, and contracts01:04:39 Conformal prediction, e-values, and anytime inference01:08:11 A new liberal arts triangle for the AI era01:11:30 The Bayesian duck and markets as uncertainty reductionReScript (transcript, PDF, refs etc) - https://app.rescript.info/public/share/fb68f94af29d3745c6cf6125e01328b5---REFERENCES:person:[00:02:50] Michael I. Jordan (homepage)https://people.eecs.berkeley.edu/~jordan/paper:[00:06:01] A Collectivist, Economic Perspective on AIhttps://arxiv.org/abs/2507.06268[00:18:09] AlphaFoldhttps://www.nature.com/articles/s41586-021-03819-2[00:20:36] Prediction-Powered Inferencehttps://arxiv.org/abs/2301.09633[00:33:47] On Three-Layer Data Marketshttps://arxiv.org/abs/2402.09697[01:04:39] Conformal Prediction with Conditional Guaranteeshttps://arxiv.org/abs/2107.07511[01:04:51] A Tutorial on Conformal Predictionhttps://www.jmlr.org/papers/v9/shafer08a.html[01:06:00] E-Values Expand the Scope of Conformal Predictionhttps://arxiv.org/abs/2503.13050[01:08:23] Computational Thinkinghttps://www.cs.cmu.edu/~CompThink/papers/Wing06.pdfother:[00:28:20] How Should the FDA Test?https://rdi.berkeley.edu/events/sbc-assets/pdfs/Summit%20session%20speaker%20slides%20submission%20form-s1-5%20%28File%20responses%29/Slides%20in%20PDF%20%28Please%20name%20the%20submitted%20file%20as%20_firstname_-_lastname_-slides.pdf%29.%20%28File%20responses%29/27-Michael%20Jordan-Session%20V.pdf#page=15[00:28:40] Michael I. Jordan Session V Slides
In this Circles Off Q&A, Rob Pizzola is joined by Plus EV Analytics, Matt Buchalter, for a deep dive into sports betting modeling — how it actually works in practice, what separates good models from bad ones, and how sharp bettors think about building and evaluating their edge. This episode is built around 10 of the toughest modeling questions pulled directly from Circles Off content and community discussion. The conversation covers how to start a model from scratch, when a model is strong enough to bet real money, and how to deal with early season uncertainty like small samples, roster turnover, and regression questions. Rob and Matt also explore what matters more between getting the mean right or the distribution right, how to think about closing line value thresholds, and how to separate variance from a broken edge when results turn against you. They also get into Bayesian vs frequentist thinking, how professional bettors evaluate their models over time, and which inputs are often overrated or underrated when building a betting model. For anyone serious about sports betting models, market pricing, or long-term edge creation, this is a practical, sharp breakdown from two experienced voices in the space. Subscribe to Circles Off for more sharp betting conversations, modeling breakdowns, and market analysis.
For Episode 100 of the MOE Podcast, we're talking about Maine deer survival, winter severity and the state's apparent move toward a newer model for estimating winter impact and deer survival.The big question is simple: how is Maine measuring the real impact of winter on the deer herd, and how clearly is that information being explained to the public?This episode is not a personal attack on anyone at Maine IFW or within state government. It is constructive criticism. When people are in paid public positions and making decisions that affect wildlife management, hunting opportunity, and Maine's outdoor traditions, the public deserves clear explanations, timely communication, and transparency about the data and models being used.From the outside looking in, the state often seems slow or scant in providing information on these issues. If new statistical models, including Bayesian-style approaches, are being used to estimate deer survival or winter impact, then hunters, landowners, and the public should be able to understand the basic assumptions, uncertainty, and management implications.We also use this conversation to connect the topic to broader lessons from 100 episodes: learning from incomplete information, staying humble, updating what we believe, and asking better questions about the Maine outdoors.Here's to #100!#MaineOutdoorEnthusiast #MaineOutdoors #MaineDeer #DeerHunting #WhitetailDeer #MaineHunting #WildlifeManagement #DeerWinteringAreas #BayesianStatistics #WinterSeverity #MaineIFW #outdoorpodcast Check us out on the web at:https://www.maineoutdoorenthusiast.comContact:maineoutdoorenthusiast@gmail.com
Some of the most powerful breakthroughs happen when methods built for one discipline get turned on another. In this episode, Dr Andree Bates interviews Dr Irina Babina, CEO of Concr, on how computational techniques originally developed in astrophysics are being applied to oncology, helping predict how individual cancer patients will respond to treatment.Irina shares her journey from genetics and targeted cancer therapy into building applied solutions, driven by a frustration many scientists recognise: good science doesn't always reach patients fast enough. A pivotal patient experience reinforced her focus on personalised biology, because behind every dataset is a person, and oncology cannot be solved purely through averages.Concr's approach is built around Bayesian computation and uncertainty-aware modelling. Instead of assuming clean, complete datasets, the system is designed to work with missingness and fragmentation, updating predictions as new evidence comes in. Irina explains how Concr connects mechanistic biological modelling and preclinical drug perturbation data to patient multi-omics, imaging, treatment response, and outcomes data from both clinical trials and real-world settings.A key application is Concr's patient-level digital twin (“Farsight Twin”), which simulates an individual's probability of response across therapies, estimates likely benefit, and helps stratify patients earlier in development. Irina shares a use case where Concr supported indication ranking from cell line data, then helped interpret phase 1 signal by estimating which patients benefited from the novel drug versus standard of care, enabling sharper inclusion and expansion planning.Looking ahead, Irina argues we're moving toward personalised oncology where population-level protocols fade, and decision-making becomes confidence-based, adaptive, and informed by longitudinal monitoring as tumours evolve over time.Topics CoveredApplying astrophysics-inspired methods to cancer biologyBayesian computation and modelling uncertaintyIntegrating multi-omics, imaging, trials, and real-world evidenceTranslational modelling from preclinical to clinical outcomesPatient-level digital twins and therapy response simulationStratification, enrichment, and reducing early-stage uncertaintyPan-cancer modelling to improve rare cancer predictionThe future of personalised oncology and dynamic monitoringEularis helps pharma and biotech leaders turn AI activity into board-defensible strategy and measurable commercial outcomes.If your organisation has plenty of AI in motion but very little that moves the commercial needle in a way the board can see, start with our 10-Day AI Diagnostic Sprint. It's a focused diagnostic that surfaces what's actually broken and what's blocking results, before you invest in a larger strategy effort.The Sprint diagnoses the problem. The AI Strategic Blueprint that follows is where we build the board-defensible strategy and plan. Details at eularis.com.If this episode described your situation, send me a LinkedIn DM starting with ‘SENSECHECK' and two things: the question you're trying to answer internally, and what's currently in flight. I'll reply with what I'd need to see to turn that activity into a defensible plan, and the next step.About the PodcastAI For Pharma Growth is the podcast from pioneering Pharma Artificial Intelligence entrepreneur Dr Andree Bates, created to help pharma, biotech and healthcare organisations understand how AI-based technologies can save time, grow brands, and improve company results.This show blends deep sector experience with practical conversations that demystify AI for biopharma leaders, from start-up biotech right through to Big Pharma. Each episode features experts building AI-powered tools that are driving real-world results across discovery, R&D, clinical trials, medical affairs, market access, regulatory, insights, sales, marketing, and more.Dr. Andree Bates LinkedIn | Facebook | X
If you enjoy this episode, we're sure you will enjoy more content like this on The Occult Rejects. In fact, we have curated playlists on occult topics like grimoires, esoteric concepts and phenomena, occult history, analyzing true crime and cults with an occult lens, Para politics, and occultism in music. Whether you enjoy consuming your content visually or via audio, we've got you covered - and it will always be provided free of charge. So, if you enjoy what we do and want to support our work of providing accessible, free content on various platforms, please consider making a donation to the links provided below. Thank you and enjoy the episode!Links For The Occult Rejectshttps://linktr.ee/theoccultrejectsOccult Research Institutehttps://www.occultresearchinstitute.org/Cash Apphttps://cash.app/$theoccultrejectsVenmo@TheOccultRejectsBuy Me A Coffeebuymeacoffee.com/TheOccultRejectsPatreonhttps://www.patreon.com/TheOccultRejectsFull show-notes bibliographyCore EEG and oscillationsAbubaker, M., & Dankaerts, W. (2021). Working memory and cross-frequency coupling of neuronal oscillations. *Frontiers in Psychology, 12*, 742860.Axmacher, N., Henseler, M. M., Jensen, O., Weinreich, I., Elger, C. E., & Fell, J. (2010). Cross-frequency coupling supports multi-item working memory in the human hippocampus. *Proceedings of the National Academy of Sciences, 107*(7), 3228–3233.Jensen, O., & Mazaheri, A. (2010). Shaping functional architecture by oscillatory alpha activity: Gating by inhibition. *Frontiers in Human Neuroscience, 4*, 186.Rayi, A., et al. (2022). Electroencephalogram. *StatPearls*. StatPearls Publishing.StatPearls / NCBI Bookshelf. (2024). Introduction to electroencephalography (EEG). *NCBI Bookshelf*.Theta, alpha, beta, gamma, and controlCavanagh, J. F., & Shackman, A. J. (2015). Frontal midline theta reflects anxiety and cognitive control: Meta-analytic evidence. *Journal of Physiology-Paris, 109*(1–3), 3–15.Eisma, J., et al. (2021). Frontal midline theta differentiates separate cognitive control strategies while still generalizing the need for cognitive control. *Scientific Reports, 11*, 14641.Jensen, O., Bonnefond, M., & VanRullen, R. (2012). An oscillatory mechanism for prioritizing salient unattended stimuli. *Trends in Cognitive Sciences, 16*(4), 200–206.Lundqvist, M., Herman, P., & Miller, E. K. (2018). Working memory: Delay activity, yes! Persistent activity? Maybe not. *Journal of Neuroscience, 38*(32), 7013–7019.Sleep architecture, spindles, and memoryCaporro, M., Haneef, Z., Yeh, H.-J., Mohamed, F. B., & Levin, H. S. (2012). Functional MRI of sleep spindles and K-complexes. *Clinical Neurophysiology, 123*(2), 303–309.Chen, P., Miao, X., Chen, J., et al. (2023). The devastating effects of sleep deprivation on memory: Lessons from rodent models, aging, and Alzheimer's disease. *Frontiers in Neuroscience, 17*, 1151639.Ng, T., et al. (2025). Bayesian meta-analysis reveals the mechanistic role of slow oscillation-spindle coupling in sleep-dependent memory consolidation. *eLife, 13*, RP101992.Patel, A. K., et al. (2024). Physiology, sleep stages. *StatPearls*. StatPearls Publishing.Páez, A., Gillman, S. O., Dogaheh, S. B., et al. (2025). Sleep spindles and slow oscillations predict cognition and biomarkers of neurodegeneration in mild to moderate Alzheimer's disease. *Alzheimer's & Dementia, 21*, e14424.Hypnagogia, N1, and dream incubationHorowitz, A. H., Esfahany, S., Boyle, M. R., et al. (2023). Targeted dream incubation at sleep onset increases post-sleep creative performance. *Scientific Reports, 13*, 5055.Lacaux, C., Andrillon, T., Bastoul, D., et al. (2021). Sleep onset is a creative sweet spot. *Science Advances, 7*(50), eabj5866.Meditation, prayer, chanting, and yoga nidraDatta, K., Mallick, H. N., Tripathi, M., Ahuja, G. K., & Deepak, K. K. (2022). Electrophysiological evidence of local sleep during yoga nidra practice in young male volunteers. *Frontiers in Neurology, 13*, 910794.Dobrakowski, P., Błaszkiewicz, M., & Skalski, S. (2020). Changes in the electrical activity of the brain in the alpha and theta bands during prayer and meditation. *International Journal of Environmental Research and Public Health, 17*(24), 9567.Gao, J., Leung, H. K., Wu, B. W. Y., Skouras, S., & Sik, H. H. (2019). The neurophysiological correlates of religious chanting. *Scientific Reports, 9*, 4262.Kaur, C., & Singh, P. (2015). EEG derived neuronal dynamics during meditation: Progress and challenges. *Advances in Preventive Medicine, 2015*, 614723.Lomas, T., Ivtzan, I., & Fu, C. H. Y. (2015). A systematic review of the neurophysiology of mindfulness on EEG oscillations. *Neuroscience & Biobehavioral Reviews, 57*, 401–410.Hypnosis and suggestionJensen, M. P., Adachi, T., & Hakimian, S. (2015). Brain oscillations, hypnosis, and hypnotizability. *American Journal of Clinical Hypnosis, 57*(3), 230–253.Kirenskaya, A. V., Novototsky-Vlasov, V. Y., Chistyakov, A. V., & Zvonikov, V. M. (2011). Waking EEG spectral power and coherence differences between highly hypnotizable and low hypnotizable subjects. *International Journal of Clinical and Experimental Hypnosis, 59*(2), 144–164.Mendoza, M. E., & Capafons, A. (2024). Neural correlates of hypnosis: A systematic narrative review. *Frontiers in Psychology, 15*, 1327738.Ritual rhythm, trance, and synchronyHuels, E. R., Kim, H. S., Lee, U., & Mollaahmetoglu, O. M. (2021). Neural correlates of the shamanic state of consciousness. *Frontiers in Human Neuroscience, 15*, 610466.Mogan, R., Fischer, R., & Bulbulia, J. A. (2017). To be in synchrony or not? A meta-analysis of synchrony's effects on behavior, perception, cognition and affect. *Journal of Experimental Social Psychology, 72*, 13–20.Tarr, B., Launay, J., & Dunbar, R. I. M. (2016). Silent disco: Dancing in synchrony leads to elevated pain thresholds and social closeness. *Evolution and Human Behavior, 37*(5), 343–349.Entrainment, binaural beats, fatigue, and overloadGoodman, S. P. J., et al. (2025). Approaches to inducing mental fatigue: A systematic review and meta-analysis of (neuro)physiologic indices. *Neuroscience & Biobehavioral Reviews, 170*, 105957.Ingendoh, R. M., Posny, E. S., & Heine, A. (2023). Binaural beats to entrain the brain? A systematic review of the effects of binaural beat stimulation on brain oscillatory activity, and the implications for psychological research and intervention. *PLOS ONE, 18*(5), e0286023.Snipes, S., et al. (2024). Extended wakefulness alters the relationship between EEG theta and alpha bursts and behavioural outcome. *European Journal of Neuroscience, 60*(8), 6268–6284.Xiang, C., et al. (2024). A resting-state EEG dataset for sleep deprivation. *Scientific Data, 11*, 406.Parkinson's disease and pathological betaAsadi, A., et al. (2022). The origin of abnormal beta oscillations in the parkinsonian corticobasal ganglia circuit. *Frontiers in Neuroscience, 16*, 823719.Paulo, D. L., et al. (2023). Corticostriatal beta oscillation changes associated with cognitive function in Parkinson's disease. *NPJ Parkinson's Disease, 9*, 202.Ancient sleep, dreams, and Asclepian healingAskitopoulou, H. (2015). Sleep and dreams: From myth to medicine in ancient Greece. *Journal of Anesthesia History, 1*(3), 70–75.Kapotsis, G., & Steiropoulos, P. (2025). Sleep incubation [enkoimesis] in medical practice at Asclepieia of Ancient Greece — the Ancient Greek sleep medicine. *Sleep Medicine, 130*, 85–89.Pavli, A. (2024). Asclepieia in ancient Greece: pilgrimage and healing. *Journal of Integrative Medicine and Research, 3*(2), 100119.Also want to remind people about the website, if you're into reading we have tons of information by multiple contributors, and we got t-shirts up on the site if you're interested. Fun fact, the art is all based on the eyeball. A
In this video I go over my old Smacc Dublin talk and revisit the Bayesian reasoning that lays at the heart of what we do in the ED. The new Docnomo / Fagan nomogram interactive tool on the website is unveiled and demonstrated with discussions around PE, SBO and paediatric appendicitis.
Today's clip is from Episode 157 featuring Stefan Radev. In this conversation, Alex and Stefan dig into one of the hardest open problems in simulation-based inference — hierarchical models.The core idea: when you move from flat to hierarchical models, you're no longer estimating one set of parameters. You have local parameters that vary by location (or subject, or city) and global parameters that capture what's shared across all of them. And you don't just want each separately — you want the full joint posterior, because that's where the Bayesian magic of shrinkage actually lives.Stefan builds the problem from the ground up. Start with the simplest hierarchical case: a two-level model. He uses electoral forecasting in France as the example — cities nested inside departments nested inside the whole country.Now your simulator has to cover all three levels. If that simulator is slow (think: brain emulators, minutes per sample), scaling to hundreds of groups becomes completely intractable. Memory issues, specialized network requirements, the works.The key insight: this problem has structure you can exploit. The joint posterior factorizes in a particularly nice way — each local parameter depends on its own local data and on the global parameters. That means instead of cramming everything into one giant high-dimensional vector and hoping a neural network figures it out, you can decompose the problem. Estimate local parameters conditioned on local data and the globals. Use composition.The takeaway: hierarchical models aren't just "harder flat models" - they have a geometry that demands a different architecture. Respecting that structure is what makes amortized inference scale.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
In this episode of The Backstory on the Shroud of Turin, Guy Powell interviews Christian apologist and theologian Tom Dallis.Tom explores why the Shroud of Turin continues to challenge skeptics and researchers alike. The conversation examines Jewish burial customs, Roman crucifixion methods, and scientific mysteries connected to the cloth believed by many to be the burial shroud of Jesus Christ.Topics include:• Jewish first-century burial practices • Joseph of Arimathea and Nicodemus • The Sudarium of Oviedo • Blood flow evidence on the Shroud • Why the image formation remains unexplained • Problems with the medieval forgery theory • Carbon dating controversies • Bayesian probability and forensic evidence • Why medieval artists could not reproduce these details • How the Shroud supports discussions about the ResurrectionTom also explains why the Shroud differs from known historical forgeries. He compares it to fake Dead Sea Scroll fragments and discusses why scientific testing continues to support the Shroud's uniqueness.The interview connects faith, science, history, and biblical scholarship in a compelling discussion about the Resurrection of Jesus Christ and The Only Witness.
Cortisol after cancer is the conversation nobody on my care team had with me. I was diagnosed with breast cancer in 2021 — invasive ductal carcinoma, stage one, grade two. I went through lumpectomy, radiation, ovarian suppression, and two years on an aromatase inhibitor before I had to come off because my bones were already in osteoporosis. Throughout all of it, my nervous system was screaming. My cortisol was running hot all day long, confirmed by a Dutch test. And not one doctor told me what stress was doing to my body or how to mitigate it. In this solo episode of Not Today Cancer, I'm walking you through the seven activities that lowered my cortisol...broken into the things that don't cost a dime (meditation, breathwork, walking outside, unplugging) and the things that do (acupuncture, energy healing, therapy). I'm also sharing the actual research behind each one, so you know this isn't woo...it's documented science. What you'll learn: • Why cortisol is wrecked after a cancer diagnosis (and why mine was high long before) • The symptoms of high cortisol most breast cancer survivors miss • How mindfulness meditation protected the cortisol rhythm of breast cancer survivors in a randomized controlled trial • Why a single session of slow breathing drops cortisol immediately • The "nature pill" research showing 20–30 minutes outside lowers cortisol 21% per hour • Why the NCCN officially recommends acupuncture for cancer survivors If you're a breast cancer survivor, caregiver, or anyone whose body has been running on fumes...this episode is for you. We don't get the option of not mitigating stress. Pick one thing on this list and start tomorrow. Disclaimer: This episode reflects my personal experience and a summary of public research. It is not medical advice. Always consult your care team.
Weshalb ermittelt die italienische Staatsanwaltschaft gegen Crew und Kapitän, während die britische MAIB im Zwischenbericht die Stabilität der Bayesian als Grund für den Untergang benennt? Was steckt dahinter? Ted Turner ist nicht nur Gründer von CNN, sondern auch eine echte Legende im Segelsport gewesen. Dazu sprechen wir über die unsägliche Geschichte rund um den Buckelwal. Und Carsten erzählt von seiner Vorbereitungsregatta zur Contender WM
Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: What is simulation-based inference and what does "sim-to-real" mean?A: Simulation-based inference (SBI) uses a mechanistic simulator as an epistemic tool: you train a neural network on a large number of labeled simulations and then deploy it on real, unlabeled data. The "sim-to-real" framing captures the key asymmetry -- your network never sees real data during training, only simulations, but it generalizes to real observations at inference time. This is the opposite of the more common "synthetic-for-ML" approach, where fake data is used purely to augment real training data.Q: What is the amortized inference agent skill and what does it do?A: It's an open-source AI agent skill, co-developed by Stefan and Alexandre, that teaches an AI coding agent to run a complete, state-of-the-art amortized inference workflow. Because amortized inference is recent enough that it's underrepresented in LLM training data, vanilla agents tend to get it wrong. The skill injects the right methodology: it guides the agent to set up the simulator, choose the right network architecture, run a pilot, train with appropriate diagnostics, and produce an actionable report -- without the user needing to know the details.Q: What is calibration coverage and why should you never skip it?A: Calibration coverage tells you whether your posterior uncertainty is honest -- whether your credible intervals actually contain the true parameter at the right frequency. A model can show poor parameter recovery yet still be well-calibrated (because it's falling back on the prior), or it can appear to recover parameters while being poorly calibrated. Running calibration diagnostics both in-sample and out-of-sample is especially revealing for hierarchical models, which often appear to underfit in-sample but generalize much better out-of-sample thanks to shrinkage.Full takeaways hereChapters:00:00:00 How does amortized inference fit into the Bayesian workflow?00:12:03 What does "sim-to-real" mean in simulation-based inference?00:15:57 Why is amortized inference particularly suited to psychology and neuroscience?00:21:51 What is the amortized inference agent skill?00:39:00 What is calibration coverage and how do you interpret it?00:41:50 How do you decide what to do next after your first training run?00:44:53 How do actionable insights make Bayesian workflows more usable?00:49:08 What are the unique challenges of hierarchical models in amortized inference?01:00:51 What is the current state of BayesFlow's support for hierarchical models?01:05:00 What are the main failure modes of amortized inference and how do you handle model misspecification?Thank you to my Patrons for making this episode possible!Links from the show
Today's clip is from Episode 156 featuring Adam Foster. In this conversation, Adam explains Expected Information Gain (EIG) -the scoring function at the heart of optimal Bayesian experimental design.The core idea: when designing an experiment, you need a way to compare possible designs and pick the best one. EIG is that score - it tells you how much information you expect to gain about your model parameters from a given design. The higher the EIG, the better the design.Adam builds intuition for EIG from two directions that sound completely different but lead to the same place. First, the Bayesian angle: simulate datasets from your prior predictive distribution, run inference on each, measure how much uncertainty dropped, and average across datasets. Second, a classic puzzle - the 12 prisoners balance scale problem - where the best weighing strategy turns out to be the one that makes all three outcomes (tip left, tip right, balance) equally likely. This maximizes outcome entropy, which is exactly what EIG does: it steers you toward designs where every possible result narrows down your hypotheses as fast as possible.The takeaway: good experimental design isn't about intuition or convention - it's about making your data work as hard as possible, and EIG gives you a rigorous way to do that.Get the full discussion hereSupport & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Cybersecurity practitioners face a persistent methodological problem: how should we reason about intelligent adversaries who observe our defenses, adapt their tactics, and choose targets based on our vulnerabilities? The field has responded with a fragmented toolkit. Quantitative risk assessment borrowed from safety engineering treats threat, vulnerability, and consequence as independent terms. Threat modeling frameworks such as STRIDE and attack trees emphasize structure but rarely quantify uncertainty. Game-theoretic models assume rationality and common knowledge that real attackers do not exhibit. Qualitative heat maps compress uncertainty into colored cells that cannot support budget optimization.This talk surveys these approaches critically, examining what each method commits you to and what it quietly sets aside. A common thread emerges: the alternatives can be understood as approximations to a Bayesian decision-theoretic ideal, each relaxing one or more assumptions for tractability. Modeling an adversary requires addressing four dimensions of uncertainty (what they want, what they know, what they can do, and how they decide) and the standard critiques of probabilistic cyber risk analysis (information asymmetry, correlated inputs, adaptation, the absence of objective base rates) turn out to be errors of naive practice rather than indictments of the methodology itself. Threat intelligence feeds, indicator matches, and shifts in attacker tradecraft fit naturally as Bayesian updates rather than as awkward inputs to frequentist frameworks. The survey closes not with a prescription but with a diagnostic question for practitioners and researchers alike: are the assumptions embedded in your chosen method appropriate for the decision you are trying to support? About the speaker: Pragathi Jha is a doctoral researcher in Industrial Engineering at Purdue University, where her work focuses on optimization, stochastic modeling, and game-theoretic approaches to decision-making under uncertainty. Her research lies at the intersection of operations research, applied probability, and strategic interaction, with an emphasis on developing rigorous mathematical frameworks for complex, adversarial systems.Her academic interests include multi-stage stochastic optimization, game theory, and the modeling of strategic behavior in dynamic environments. In the context of cybersecurity, she is particularly interested in adversarial decision-making, risk-aware resource allocation, and the design of resilient systems that account for uncertainty and strategic threats. Her work aims to bridge theoretical advances in optimization and game theory with practical applications in security, infrastructure protection, and data-driven decision support.Pragathi brings a strong foundation in quantitative methods and is committed to advancing research that is both mathematically rigorous and operationally impactful. Through her work, she seeks to contribute to the development of robust, scalable frameworks for analyzing and mitigating risks in complex, high-stakes environments.
Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeawaysQ: What is Bayesian experimental design and what problem does it solve?A: It's the practice of using a Bayesian model to decide how to collect data before you collect it. Most statistical thinking starts with a fixed dataset. Bayesian experimental design sits upstream -- you have control over experimental parameters (which questions to ask, which reagents to mix, which conditions to test) and you want to choose them optimally. The Bayesian angle is to ask: what new data would most reduce my current uncertainty?Q: When should you actually use Bayesian experimental design?A: When two conditions hold: you have active control over how data is collected (not just passive observation), and you have a Bayesian model whose prior predictive distribution gives a reasonable picture of what typical data might look like. It's especially valuable when data collection is expensive or irreversible -- when the "committal step" of running an experiment has real cost, it's worth doing the analysis first.Q: What is expected information gain (EIG) and why is it central to Bayesian experimental design?A: EIG is the score you assign to a candidate experimental design -- the amount of information you expect to gain about your model parameters by running an experiment with that design. You compute it by simulating datasets from your prior predictive, doing Bayesian inference on each, and averaging how much the uncertainty decreased. What's remarkable is that you can derive the same quantity from two completely different starting points -- reducing parameter uncertainty, or maximizing outcome uncertainty while correcting for noise - and arrive at the same formula. That convergence is why EIG keeps being re-discovered independently across fields.Full takeaways hereChapters:00:00 What is Bayesian experimental design and why does it matter?00:06:02 What problem does Bayesian experimental design actually solve?00:08:54 When should practitioners use Bayesian experimental design?00:12:00 Is Bayesian experimental design changing how scientists work in practice?00:15:04 What are the limitations of Bayesian experimental design?00:17:55 What is expected information gain (EIG) and how does it work?00:21:05 How do you compute expected information gain in practice?00:23:48 What is active learning and how does it connect to Bayesian experimental design?00:41:02 What is active learning by disagreement?00:48:57 What is deep adaptive design and when should you00: use it?00:56:02 How is Bayesian experimental design applied in protein dynamics and quantum chemistry?01:01:58 What does a practical Bayesian experimental design workflow look like?Thank you to my Patrons for making this episode possible!Links from the show
Early bird discounts for the San Francisco World's Fair, the biggest AIE gathering of the year, end today - prices will go up by ~$500 tonight so do please lock in ASAP!From near-universal AI tool adoption inside Shopify to internal systems for ML experimentation, auto-research, customer simulation, and ultra-low-latency search, Mikhail Parakhin joins us for a deep dive into what it actually looks like when a 20-year-old, $200B software company goes all-in on AI. We cover why Shopify has become much more vocal about its internal stack, what changed after the December model-quality inflection, and why the real bottleneck in AI coding is no longer generation, but review, CI/CD, and deployment stability.We also go inside Tangle, Tangent, SimGym, which are three major AI initiatives that Shopify is doing to make experimentation reproducible, optimization automatic, customer behavior simulatable, and search and catalog intelligence faster and cheaper at scale. Along the way, Mikhail explains UCP, Liquid AI, and why token budgets are directionally right but often measured badly, why AI-written code can still increase bugs in production, what makes Shopify's customer simulation defensible, and what he learned from the Sydney era at Bing.We discuss:* Mikhail's path from running a major Microsoft business unit spanning Windows, Edge, Bing, and ads to becoming CTO of Shopify* Why Shopify is talking more publicly about AI now, and why staying at the frontier has become necessary for the company* Shopify's internal AI adoption curve, the December inflection, and why CLI-style tools are rising faster than traditional IDE-based tools* Why Jensen Huang is directionally right on token budgets, but raw token count is still the wrong way to evaluate engineering output* Why the real unlock is not more agents in parallel, but better critique loops, stronger models, and spending more on review than generation* Why AI coding can still lead to more bugs in production even if models write cleaner code on average than humans* Why Shopify built its own PR review flow, and why Mikhail thinks most off-the-shelf review tools miss the point* How PR volume, test failures, and deployment rollback are becoming the real bottlenecks in the agent era* Why Git, pull requests, and CI/CD may need a new metaphor once code is written at machine speed* What Tangle is, and how Shopify uses it to make ML and data workflows reproducible, collaborative, and production-ready from the start* Why Tangle is different from Airflow, and why content-addressed caching creates network effects across teams* What Tangent is, and how Shopify is using auto-research loops to optimize search, themes, prompt compression, storage, and more* Why Tangent is becoming a democratizing tool for PMs and domain experts, not just ML engineers* Why AutoML finally feels real in the LLM era, and where auto-research still falls short today* Why Tangle, Tangent, and SimGym become much more powerful when combined into one system* What SimGym is, why simulated customers only work if you have real historical behavior, and why Shopify's data gives it a moat* How SimGym evolved from comparing A/B variants to telling merchants what to change on a single live storefront to raise conversions* Why customer simulation is so expensive, from multimodal models to browser farms to serving and distillation costs* How Shopify models merchant and buyer trajectories, runs counterfactuals, and thinks about interventions like discounts, campaigns, and notifications* Why category-level behavior is so different across commerce, and why ideas like Chinese Restaurant Processes are showing up again in practice* Shopify's new UCP and catalog work, including runtime product search, bulk lookups, and identity linking* Why Shopify is using Liquid AI, and why Mikhail sees it as the first genuinely competitive non-transformer architecture he has used in practice* Where Liquid already works inside Shopify today, from low-latency query understanding to large-scale catalog and Sidekick Pulse workloads* Whether Liquid could become frontier-scale with enough compute, and why Shopify remains pragmatic and merit-based about model choice* Who Shopify is hiring right now across ML, data science, and distributed databases* The Sydney story at Bing, why its personality was not an accident, and what Mikhail learned from deliberately shaping AI character early onMikhail Parakhin* LinkedIn: https://www.linkedin.com/in/mikhail-parakhin/* X: https://x.com/MParakhinTimestamps00:00:00 Introduction: Mikhail Parakhin, Microsoft, and Shopify00:01:16 Why Shopify Is Talking More About AI00:02:29 Internal AI Adoption at Shopify and the December Inflection00:06:54 Token Budgets, Jensen Huang, and Why Usage Metrics Can Mislead00:10:55 Why Shopify Built Its Own AI PR Review System00:12:38 AI Coding, More Bugs, and the Real Deployment Bottleneck00:14:11 Why Git, PRs, and CI/CD May Need to Change for Agents00:18:24 Tangle: Shopify's Reproducible ML and Data Workflow Engine00:21:19 Why Tangle Is Different from Airflow00:26:14 Tangent: Auto Research for Optimization and Experimentation00:30:07 How Tangent Democratizes Experimentation Beyond ML Engineers00:33:06 The Limits of Auto Research00:36:36 Why Tangle, Tangent, and SimGym Compound Together00:37:20 SimGym: Simulating Customers with Shopify's Historical Data00:42:47 The Infra Behind SimGym00:46:00 Why SimGym Gets Better with Real Customer History00:47:30 Counterfactuals, HSTU, and Modeling Merchant Trajectories00:51:55 CRPs, Clustering, and Category-Level Customer Behavior00:53:30 UCP, Shopify Catalog, and Identity Linking00:55:07 Liquid AI: Why Shopify Uses Non-Transformer Models00:59:13 Real Shopify Use Cases for Liquid01:03:00 Can Liquid Scale into a Frontier Model?01:09:49 Hiring at Shopify: ML, Data Science, and Databases01:10:43 Sydney at Bing: Personality Shaping and AI Character01:13:32 Closing ThoughtsTranscript[00:00:00] swyx: Okay. We're here in the studio, a remote studio, with Mikhail Parakhin, CTO of Shopify. Welcome.[00:00:08] Mikhail Parakhin: Thank you. Welcome.[00:00:10] swyx: I don't even know if I should introduce you as CTO of Shopify. I feel like you have many identities. Uh, you led sort of the, the Bing ML team, I guess, uh, uh, or ads team. I, I don't know, I don't know, uh, you know, it's, uh, people va-variously refer you as like CEO or, or, uh, I don't know what that, that, that said previous role at Microsoft was.[00:00:29] Mikhail Parakhin: Uh, that was... Yeah, my previous role w- at Microsoft was the-- I actually was the CEO of one of Microsoft's business units, which included, as I, you know, as we discussed, all the things that people like to laugh about, uh, including Windows and Edge and Bing and ads and everything.[00:00:47] swyx: Yeah, yeah. What a, what a, what a wild time.You've obviously, uh, done a lot since you landed at Shopify. Uh, one of the reasons I reached out was because you started promoting more sort of internal tooling, uh, primarily Tangle, but also a lot of people have seen and adopted Tobi's QMD, uh, and obviously, I think, uh, Shopify has always been sort of leading in terms of, uh, engineering.I think more-- it's just more recent that you guys have been more vocal about your sort of AI adoption. Is that, is that true?[00:01:16] Mikhail Parakhin: Well, I think AI tools in general are fairly recent development, uh, and we've-- Shopify, you know, at this stage of its development, we're developing AI in-in-house and other, uh, building tools that use AI and, you know, interfacing with the wider AI community, uh, you know, are on the sort of the, uh, runaway trajectory.So it just did by sort of natural byproduct. We, we talk about it more also. We just, uh, just even yesterday, Andrej Karpathy was famous in tweeting about, oh, are there some, uh, ways, uh, that, that you can organize your agents to store the data and then, uh, look up the data so that you don't have to research or, or lose context every- Yestime. And a little bit tongue in cheek, I tweeted that, “Hey, we've, we've done it much earlier, and we even have different approaches, Tobi and I.” Tobi, of course, is a big fan of QMD, and I'm more of a SQL, SQLite fan. But, uh, yeah, very similar things that we've already done here. The point is, yeah, we're very dynamic, you know, explosively growing company, and we have to be at the forefront of AI adoption, obviously.[00:02:29] swyx: Yeah. Yeah. Um, you, your team kindly prepared some slides actually that we were gonna bring up on to, uh, the screen. I think I can, I can screen share, and then we can kind of go through some of the shocking stats that maybe, maybe put some numbers to what exactly is going on. So here we have, uh- An internal AI tool adoption chart.What are we looking at here? What ?[00:02:54] Mikhail Parakhin: Yeah, this is very interesting statistics. Uh, this is number of daily active workers, you know, think of, uh, DAO, basically the active users of-[00:03:05] swyx: Yeah ...[00:03:05] Mikhail Parakhin: AI tool as a percentage of all the people in the company, right? And then- Yeah ... different AI tools. And, uh, you could see two things here is that one is the green is total.Uh, green is just total. So you could see that it approaches really % by now. It's hard not to do your job now without interacting deeply, at least with one tool. You could see another interesting thing is just as many people commented in December was the phase transition when suddenly models gotten good enough that, that everything took off and started growing.Uh, it, it was many people noticed that the thing is that small improvements accumulated into this big change in Sep- December roughly timeframe.[00:03:52] swyx: Yeah.[00:03:52] Mikhail Parakhin: The other thing I would claim you could see is that, uh, CLI-based tools and tools that don't require you to look at the code becoming more popular, and you could see, yeah, various versions of, uh, Cloud Code and Codex and Pi and internal development tools taking off.Uh, exactly, yeah, uh, and blue is our River, just internal agent for coding, where tools, uh, that require IDEs such as, uh, GitHub, Copilot or Cursor, they're not exactly shrinking, but they're not growing as fast. Like, uh, red, red line is, is the IDE kind of tools. So you could see that they're, they're not experiencing as, as fast of a growth.[00:04:37] swyx: As I understand it, basically, every employee has their choice, right? Of choose whatever tool you use, and then you're just kind of doing a, a daily sur-survey or something.[00:04:47] Mikhail Parakhin: Exactly. And, uh, we- Yeah ... the, the push is to get your job done, you can use any tool, and we effectively fund unlimited tokens for everybody.Uh, we, we do, we do try to control the models that, uh, people use, but from the bottom, not from top. Like we basically say, “Hey, please don't use anything less than Opus four point six.”[00:05:09] swyx: Oh .[00:05:10] Mikhail Parakhin: Some people, some people end up using GPT five point four extra high. Some people use Opus four point six. Um, uh, you know, uh, there are some, uh, there are plus and minuses in going for full one million context window versus not.But, uh, we try to discourage people from using anything less than that.[00:05:28] swyx: Yeah, yeah. Got it, got it. Uh, I mean, uh, that's, you know... The, the next chart here, it really kind of shows the expansion and the sort of December twenty twenty-five inflection, right? That, uh, people are using a lot of tokens. I think it's also really interesting that no one was kind of abusing it in twenty twenty-five.Like it was- Had comparatively, uh, to this year, there was almost no growth. I mean, it's still like, you know, probably, probably gave fifty percent.[00:05:56] Mikhail Parakhin: Yeah. This is just a different scale. It's still exponential- Yeah, yeah ...growth at just a different- ...rate of expansion. Uh, there was inflection point, and Sean, I would claim the, the super interesting part here is that you could see that the distribution becoming more and more skewed.Yes. The top percentiles grow faster. So that means- Yeah ...the people in the top ten percentile, they, their consumption grows faster than seventy-five and so forth. So, uh, the distribution skews more and more towards the highest users, which is... I don't know what it tells me. It's like it feels not ideal, to be honest.Or maybe it's okay. We'll see.[00:06:36] swyx: Why does it feel not ideal? Is, is it because of, um, quantity over quality, or what's the concern?[00:06:42] Mikhail Parakhin: Because take it to the limit. That means, you know, if, if this rate of separation continued- Ah, yes ...a year, there will be one person consuming all the tokens. So it's just, it's kinda strange.[00:06:54] swyx: Yeah, I mean, um, uh, I, I think internal like teaching and all that, uh, will, will help sort of distribute things more widely. But in, in the early days, of course, the people who are sort of more AI-pilled will obviously find more ways to use it than the people who are less AI-pilled. Maybe let's, let's call it that.I'll just, I'll just kinda quickly, uh, pause from the, the... You know, we will go back to the rest of the slides, but I just wanna, um, review, you know, there are a lot of CTOs of, of large companies like yourself where they're all considering some kind of token budget, right? Like I think it's something, something that Jensen Huang has been talking about, where like if your 200K engineer is not using 100K of tokens every year, like they're, they're underutilizing coding agents.Of course, Jensen Huang would say that, but like it seems a very quantity over quality approach and like some, some people are basically saying like, well, is this comparable to judging engineer quality by lines of code, right? Which we also know is like kind of flawed, but better than nothing. So I, I don't know if you have like a sort of management take here on, on how to view this kind of, uh, metrics.[00:08:02] Mikhail Parakhin: Well, I mean, you're, you're baiting me. I, I like... This is my favorite topic. Uh, if you let me, I'll probably talk for two hours on just this. I have a lot of things to say. Like I do think Jensen gotten a lot of bad press saying, “Oh, of course you're, you know, this, uh, the- ...the cake seller says you don't need enough cakes.”You know? Like, of course. Uh, but, uh, I actually, uh, think that's undeserved. I think he, he's actually right. Uh, I do think- He,[00:08:33] swyx: he's directionally correct.[00:08:35] Mikhail Parakhin: Yeah. Yeah. He's directionally correct for sure. Uh-[00:08:37] swyx: Who knows what the right number is? Yeah.[00:08:39] Mikhail Parakhin: The thing that I do Uh, want to say, and this is something that we learned through trial and error and very important is like two things.One is that it's not about just consuming tokens. Uh, you can consume tokens and, and in fact, the anti-pattern is running multiple agents, too many agents in parallel that don't communicate with each other. That's almost useless, uh, compared to just fewer agents and burns tokens very efficiently. Uh, setting up the right critique loop, especially with the high quality models, where one agent does something, the other one, ideally with a different model, critiques it, uh, suggests ways to improve it, the agent redoes it with this critique and, and so it takes much longer.So people don't like it because latency goes up. You know, they, they have to wait until this debate is happening. But, uh, the quality of the code is much higher. And another thing, just since you mentioned like, look, uh, uh, yeah, the overall budget is just like, uh, lines of codes. Lines of codes are exploding for everybody right now, or partially because AI is really mover balls, but partially just because AI can write a lot more code, you know, doesn't get tired.And so you have to have to have a very strong narrow waist during PR review. Otherwise, just the number of bugs will go through the roof. It's, uh, it's this unexpected consequence of the just volume trumping everything. I would claim by now good model writes code on average with fewer bugs than, than the average human.But since they write so much more of it, like more of it will make it into production. So you have to- You still[00:10:26] swyx: have[00:10:26] Mikhail Parakhin: more bugs. Yeah. Have to have a very rigorous PR reviews, also automated of course. But, uh, yeah, that to spend a lot budget there. Like this, this for me, for me, actually, the important metric is the ratio of budget spent during code generation versus, uh, spent, uh, expensive tokens like GPT, uh, five point four Pro or, uh, uh, Deep Think from Gemini, you know, checking on PR reviews.[00:10:55] swyx: Yeah, totally. Uh, I noticed in your chart you didn't have any review tools. Do you just use like, like let's say a Claude code to review tools? Or do you have another set of review tools like the Greptiles, the Code Rabbits, uh, Devin Reviews has a review tool. I don't know if you've had those specialist review tools.[00:11:13] Mikhail Parakhin: You are a little bit jumping on my store tool right now because the graphs I was only showing public tools. Uh, uh, the-- I haven't found a good PR review tool that, that does what I think should be done. And, uh, partially my, my thinking is because it's so... It just goes against both what people feel like emotionally they prefer and, uh, some of the, uh, you know, frankly Even business models that, that the companies run.At peer review tool, uh, time, you want to run the largest models. That means, I don't know, Codex or, or, uh, Cloud Code is not gonna cut it. You need to have pro-level models if you really want to, uh, stand the tide of bots from going into production. And you need us to spend a lot of time, the models taking turns, but you don't want, like, a big swarm of, uh, of, uh, agents.So in fact, you end up in a different dual-dualistic world where you generate not that many tokens. You, in fact, generate few tokens, but it takes f-a long time because these are expensive models taking turns rather than many, many agents trying to do many things in parallel. So that's, that's why I feel like I haven't found good tools, so we are using our own for peer review for now.[00:12:33] swyx: Yeah. Yeah. I mean, uh, I think a lot of companies are building their own, uh, especially to their needs, right?[00:12:38] Mikhail Parakhin: Mm-hmm.[00:12:38] swyx: Um, I, uh, you also have a chart here going back to the slides on, uh, PR merge growth, where we're now at thirty percent, uh, month on month rather than ten percent. Uh, and also the, the estimated complexity is going up.You know, this is productivity, right? ‘Cause y- presumably there's more stuff going into the code base and more, more features getting worked on. I'm curious about the backlog, right? Like the, the, the-- I actually don't mind a pro-level model taking an hour or two hours to review my PR, because I've dealt with humans who take a week to review my PR, right?And I keep pinging them on Slack, “Hey, hey, review my PR.” So, you know, I think there's some trade-off here where, like, it still doesn't make sense.[00:13:18] Mikhail Parakhin: Exactly. That, that's exactly m-my point. Uh, that on one hand, you can tolerate longer latencies at, uh, PR. On the other hand, like right now, the real problem is not in spending time waiting for PR.It's real problem is since there's so much more code than- Yeah ... uh, probability of at least some tests failing going up, and then you, like, keep de-failing, then you have to find the offending PR, evict it, retest it without that PR, and so deployment cycle becomes much longer. Uh, so it actually, in terms of the overall time to deploy, it's total time savings if you spend more time on a longer model, like thinking for an hour, because then, then you, you don't have to spend all that time during testing and rolling, you know, rolling back the deployment.[00:14:03] swyx: Yeah, totally. That's still worth it. You know, you don't look at the individual, look at the aggregate, and look at the, the, the change in the aggregate system.[00:14:11] Mikhail Parakhin: Exactly.[00:14:11] swyx: I'm kind of curious if, like, there's this PR mentality and, like, c-- the, the, the CICD paradigm will be changed eventually. Some people are like, obviously a lot of people want new GitHub, but I even wonder if, like, Git is the problem, right?Like, is that the bottleneck? Is the concept of a PR a bottleneck? Do you guys use stack diffs? I don't know if, uh, that's a, like, a merge queue stack diff type of thing.[00:14:34] Mikhail Parakhin: We, we use, we use Stacks, we u- we use Graphite. We worked with, uh, Graphite a lot. Uh, so we use Stack, uh, PRs. I think, uh, like that's clearly the overall CICD in general, and the interaction with the code repository right now is the, clearly the sort of the, the main issue and the bottleneck for us, uh, and highest top of mind.I would say we probably need a different metaphor or different whole design of how to process it in new agentic world. I haven't seen anything dramatically better yet. I, I think everybody right now is just trying to keep their head above the water ‘cause, ‘cause there, there's so many PRs and then everybody's CICD pipelines start creaking, the, the times are increasing, the number of bugs slipping by increasing, and you have to, have to clap on down.And so we are a little bit in this situation when we need to first stabilize that story and then start thinking, hey, what, what it could be a completely different and new world, which I haven't... I know some people working on it. I haven't seen something, like anything super compelling yet, but clearly the old thing were designed for humans will need to be morphed into something new.[00:15:53] swyx: One of the thing that I, I think about is kind of like the merge conflict is basically a global mutex on the whole system, right? And in, in hu- in human organizations, we do have something like that. It's the company standup. But like, other than that, it's like it's actually fitting for us to be somewhat decentralized, somewhat plugged into one stream of information source, but somewhat lossy.Like it's okay, you know, that, that not every delivery is like atomic consistency. Like we're not dealing with a database sometimes.[00:16:27] Mikhail Parakhin: This is a very good point, uh, because since humans don't write code too fast, you know that global mutex is not too bad. Once you-[00:16:36] swyx: Yes ...[00:16:37] Mikhail Parakhin: start writing code at the speed of machine, it becomes the, you know, the bottleneck.Then what do you do? Maybe, and I can't believe I'm saying this because I, I'm long-- lifelong opponent of, uh, microservices, and I always thought that was, like, a really bad idea. And now that you're saying it, like, maybe in new guys like microservices will make a comeback, you know, because then you, you can ship things independently in tiny things and, and the managing all that complexity automatically will be much easier.I don't know. Like, we'll s-- we'll have to see.[00:17:10] swyx: Yeah. I mean, I don't know what the Microsoft or, or Shopify thing is, but I, I read this paper from Google where they have a monorepo that deploys into microservices, right? And then, uh, the other concept that I think about a lot is the Chaos Monkey concept from, from Netflix.Being able to create, like, this robust system where, um, uh, you know, you, you have the service discovery, you have the, uh, the independent, independent microservices discovery and, and, uh, you know, probably going to be a fair amount of duplication. That's how an organic system sort of scales, uh, that, that you have that...I don't know how you call it. Slack? Robustness? Depend-- uh, d-duplication. I, I, I forget the-- I, I'm-- And this-- those-- these are not exactly the terms- Hmm ... I'm looking for, but I c-can't really think of the words. Okay. I was gonna go into Tangent and Tangle. Uh, so, uh, we, we sort of discussed the overall stats that, uh, Shopify has.Uh, but, you know, I, I think some, some pretty cool stuff that you guys are working on is your ML experimentation, uh, and your, your sort of auto tr-research training pipeline. Presumably you're much closer to this one because it's, it's a sort of personal hobby of yours. How, how would you explain them in, together?I thought we have a slide that, like, uh, has the s- the system diagram.[00:18:24] Mikhail Parakhin: Yeah. Tangle first and then Tangent as a-[00:18:27] swyx: Yeah ...[00:18:28] Mikhail Parakhin: as a thing on top of Tangle. And, uh, Tangle is the third generation, I claim, of, uh, systems of, uh, running any data processing, but a bit with a skew for ML experiments, but not necessarily. Any sort of data processing tasks where you need to iterate, share, and you have scale so that you want maximum efficiency.You know how, like, normally you would work, you would-- Imagine you're a data scientist or an ML practitioner, you would get Jupiter notebooks or, or maybe you would get, uh, you know, Pyth- your Python scripts, and you would manage the data, and you produce those TSV files, and you put them in some JFS or something.Then you would notice that, oh, it has this, uh, weird missing values. You go and write another script that, uh, goes and replaces them with, uh-[00:19:20] swyx: Ah ...[00:19:21] Mikhail Parakhin: dash S. And then, then you, then you run some, some, uh, “Oh, I need to filter bots.” And so you run some light GBM model that, uh, removes the bots. And then, then you like-- And then you, you kind of like get into shape, and then you start experimenting, and you run multiple experiments, and then you're like, “Oh my God,” like, “this experiment is worse.”You undo, and you cannot get to previous result. And like, “Ah, what did I do?” Like that. Again, then, then you finally like get everything working. Then you like start throwing it over the fence to production. You, you replicate it, those things don't work, and then sometimes you like don't notice that you forgot some feature naming and the, the features don't match.But then, like imagine you, you did everything, and then six months later you're like, have to repeat it because now there's more data, or you wanted to do another pass, and you're like, “What, what did I do?” Or like, or like, “This script crashes now,” or the, “the path has changed.” And then, then you're trying to, like you spend another month just doing ar- digital archeology on your own, you know, history, right?Now multiply that by many, many teams. Now imagine you got an intern that you wanna ramp up. Now you have to show that intern, “Oh, you know, look, here's the folder, there's the scripts, you know, ask your cloud agent to do, and then, uh, to, to figure it out.” And then cloud agent does something, and then you're, “Ah, yeah, right, right, it was the wrong folder.I forgot to tell you, I actually have this other thing I forgot myself.” And, and that's, that's the, like, the daily life we all, uh, all know it, uh, if, if you're a data scientist, machine practitioner, ma- machine learning practitioner or, uh, or even like any data managing, uh, person.[00:21:00] swyx: Yeah. So I, I used to do this, uh, f- uh, on the quant finance side, uh, in, in my hedge fund.So we did this before Airflow, and then, uh, obviously Airflow came along and, uh, then more recently Dagster, uh, I would say is like, in my mind, what I would use for that shape of problem, uh, where you had to materialize assets and create a pipeline.[00:21:19] Mikhail Parakhin: And that's, that's very good segue because... So Airflow is great, but Airflow is more about you, you have something and you wanna repeatedly run it in production on schedule.It's less about you as a team developing things and being able to share, and you grabbing the standard pipeline and saying, “Hey, I wanna change this tiny little component in the huge sea of data processing, and I don't wanna-- I wanna run ten experiments on this, and I wanna do hyperparameter optimization.”All that is very hard to do with Airflow. It's very easy to do with Tango. Tango is m- more about, it's everything about group of people Running experiments, it might be agents too nowadays. Uh, running experiments cheaply, collaborating, sharing results. Uh, you don't need to understand fully. You, you grab-- you clone somebody else's experiment or somebody else's pipeline, uh, run, uh, change small piece, run it, be, like, get it to production state, and then ship in one click.So then the... You don't have to port it into any other system to, to run in production. You can just run the same experiment. It's, it's fully production ready. And, and it's, uh, it has lots of... Again, as I said, it's third generation system. The original one was, I would claim there was Ether and then, uh, at least in my career, Ether was the first, first, uh, that pioneered this type of approach.And then there was, uh, Nirvana, which, uh, uh, at Yandex, which did kind of sec-second take on this. And now this one aggregates the, the learnings from all of those and, and Airflow as well to, to get to the state where you try it, it, it feels kind of magical. Uh, ‘cause now everything is based on content, uh, hashes.So even if the version changed, but if the output didn't change, nothing is being rerun. It's very efficient. If you... Multiple people start experiment that needs the same sort of data preprocessing, it's not repeated multiple times. It's automatically done only once. If you start ten experiments that all require, you know, some, some data preparation first as the first step, and you don't have to coordinate for that.Like, you don't have to know that other people are starting it. You now, it's very easy compos-, uh, composability, any language you can u- uh, you wanna use, and it's very visual. So you can see immediately, you can edit it easily, you can assemble small things with just even mouse clicks if you want to, and, uh, share, clone.And everybody knows also it's fully kind of static in the sense that we rerun it second time, it will exactly have the same results. Like, you will never have to do digital archeology. So full versioning and everything is also there.[00:24:06] swyx: Uh, so, so people can, uh... It's open source. Go to the GitHub repo and, and, uh, check it out.Uh, and it is also a really good, uh, blog post about it. I think all these is, like, really appealing. The, the, the, the thing that I think sells me the most about it is that, um, sort of development to production transition, right? Which I think, um, a lot of people haven't really solved that, uh, strictly, right?Like, we develop really, really well in, in Python notebooks, but then, you know, that's obviously not a sort of production ready process. I think that, like, any way in which that is solved, I think is, is very appealing. Then the other thing that you mentioned, which also raised my eyebrows, was content-based caching, which you mentioned is, is, um, you know, is ve-very much, uh, um, a sort of efficiency measure about, uh, you know, just like recalculation only on, on sort of content addressing Which I think makes sense.Uh, it surprised me that the savings could be this much, but maybe I just haven't worked at your scale where there's so much duplication, uh, that people just rerun because they change a single ID upstream.[00:25:10] Mikhail Parakhin: It does, yeah. But it's not only you rerun. The, the main savings are coming from the fact that you ran it, you got your job done, and you moved on.Then- Yeah ... somebody else in some department you don't know existed runs the same task, but on a newer version.[00:25:27] swyx: Yeah.[00:25:27] Mikhail Parakhin: Like right now, you can't, in, in most of the organizations, you can't even find out about it so that you can't even measure that you're spending that time twice, right? Here- Yeah ... if everybody's on Tango, that's detected automatically and detected that the output is the same.And then for that person, all it looks like is like experiment just suddenly moved, jumped forward, right? Uh, uh- Yeah ... so that's because, because the, there's network effect of multiple people helping each other.[00:25:51] swyx: Yeah. This is one of those things where it's designed to be a platform from the beginning rather than an individual developer's tool from the beginning, right?And, and everything's gonna streams down from there. That is the sort of Tango, uh, orchestrator, and it's, it manages jobs. We've seen a few versions of this, and this is obviously, uh, uh, the sort of, uh, unique approaches that you guys have, have, uh, figured out. And then there's Tangent.[00:26:14] Mikhail Parakhin: Yeah. And Tangent is basically an automatic auto research loop that can help and kind of do your work for you.Uh- ... you know, uh, effectively, effectively, Andrej Karpathy recently popularized it with auto research. Yes. Remember he said like he was, uh, speed running this, uh... Yeah, uh, you know the story. The, here we're basically bringing the same capability into Tango so that, uh, the, uh, Tangent can analyze it. It's just an agent that can run multiple experiments, figure out what can be changed, and keep on rerunning it, keep on modifying until, uh, maximizing some goal, some loss function, whatever you need to, to achieve.And in general, I would say if you're not using auto research-like approach in whatever you do, like literally whatever you do, then you're missing out. We saw at Shopify that taking like a wildfire, anything where you can put measurements can be done dramatically better. Our-[00:27:19] swyx: Mm-hmm ...[00:27:20] Mikhail Parakhin: uh, speed of, uh, templatization HTML, uh, completely new UX tem- uh, templatization of, uh, reducing latency for liquid themes.Uh, we-- Our, uh, search, uh, recently we moved from It's hard even, uh, quote from eight hundred QPS to forty-two hundred QPS with the same quality just by pure optimizations and not a research loop that kept running and changing code in our index serve on the same number of machines, just increasing the throughput.We, we managed to improve the quality of gisting and machine learning process. Uh, you know, gisting is the prompt compression technique that[00:27:59] swyx: allows for[00:28:00] Mikhail Parakhin: lower latency and, and lower and, uh, actually higher quality slightly. So like literally whatever different walks of life, and it doesn't have to be AI related.Uh, we, we had a reduction in, uh, storage because the agents would go and find data sets that clearly are derivative, uh, and then you don't need to store things twice. You know, we, we, we found somewhat embarrassingly that it was one of the largest tables was hashing random IDs into another random ID, and we literally- Oofput only one. So it was translating, yeah, two random IDs hashed[00:28:36] swyx: into[00:28:37] Mikhail Parakhin: each. So, so[00:28:37] swyx: it has access to the code as well, so it can, it can check the, like what, what the hell is it doing?[00:28:42] Mikhail Parakhin: So there, there cou- it could be run in two levels. You, uh, you know, at the superficial level, it could just use ex-existing components and, uh, reshuffle them.Uh, you know, like you can grab- Yeah ... uh, XGBoost, and you can grab some, some Py- PyTorch module, and then can grab some, you know, grab another tools and, and combine them. At a deeper level, since Tangle is all sort of CLI based underneath you, every, every component is a wrapped really CLI, uh, call and a YAML file, it can analyze code and create new components and, and, uh, keep on iterating as well.So, so you can, you can both have quick modifications of existing t- uh, pipelines with the, with components that are already there pre-baked, or you can create new components, uh, and-[00:29:29] swyx: Yeah ...[00:29:29] Mikhail Parakhin: keep iterating on those. So auto research is, again, this is probably the, the thing I was excited the most in the last two months happening, and we see it taking like, like totally like a wildfire.Just, uh, everybody, every day, every... well, every day, every minute, I would, uh, have somebody Slack message saying, “Oh, look how much better I made it.” And, uh, it's all throughout the research.[00:29:53] swyx: Is this democratized in some way in, in the sense that like is it your ML, uh, engineers and researchers doing this, or is it your regular PMs and software engineers also have the ability to auto-- to use Tangent?[00:30:07] Mikhail Parakhin: This is an awesome question. Like, Tango in general and Tangent in particular are extremely democratizing. Like they- Yeah ... they are the main tools for- ‘Cause I don't[00:30:15] swyx: need the details.[00:30:16] Mikhail Parakhin: Yeah. Exactly. Initially used by ML and AI engineers, but then literally, as you said, PMs are like the highest user right now is one of PMs on our org, uh, Sartak and he was, he was number one by, by usage of, of this ‘cause they're just, uh, energetic and knowledgeable, and now it, it unlocks a lot of capability where you don't have to co-change code manually.[00:30:39] swyx: I mean, I mean, because it kind of cuts out the ML, ML engineer from the process because the, the, the PMs have the domain knowledge and the ability to think about, uh, from first principles about, okay, what, what results do I want? And they can-- they even have the access to the data that, that needs to go in.So it's like in some ways, like this is the magic black box that we've always wanted for, for training and, and for, uh, I guess, uh, uh, hill climbing, whatever.[00:31:04] Mikhail Parakhin: It's basically cloud code for your AI development- ... uh, situation, right? Like now, now you don't have to know exactly how algorithms work. You can just, uh, bring your domain knowledge and expertise and product knowledge and iterate within Tangent until you've gotten the results that you need.[00:31:21] swyx: In my previous roles, every time that someone has pitched AutoML, you know, I've always been like, “Uh, this is not, this is not gonna work. It's, you know, it's, it's always gonna be a flop.” Somehow it's working now. I mean, presumably the answer is now we have LLMs and it's good enough, right? It's, it's an emergent property that we can do auto research, but like, it doesn't feel that satisfying that how come we didn't do this before, right?Like we just did like parameter search and like, I don't know. That's maybe that's it.[00:31:48] Mikhail Parakhin: Yeah. Bayesian optimization and hyperparameter optimization was, was the one that, or facet of AutoML that was used very actively, which incidentally also built into, uh, Tango. But, you know, I know Patrice Simard very well, and, uh, he was such a, uh, such a proponent of AutoML, and he put, like literally spent careers trying to democratize it.Without LLMs, it just turned out to be very hard. Like it, you, you would have flexibility within certain narrow domain, but it was hard to wider scale, and now with LLMs suddenly it's like magic wand, and so suddenly everybody- ... is an AutoML expert.[00:32:28] swyx: Yeah, I, I think it's multiple things, right? Like I'm, I'm just gonna bring up the, the, the chart again, right?Like LLMs can do the monitoring very well. That is the very potentially unbounded, super unstructured. It can do the analysis very well, it can do the... Uh, and basically it is much more intelligence poured into every single step. Uh, there's maybe nothing structurally changed about AutoML, but this is just m-more intelligent and more unstructured.[00:32:53] Mikhail Parakhin: Exactly.[00:32:54] swyx: Any flaws that you've run into? Like everyone is like drinking the Kool-Aid, oh my God, time savings, uh, you know, performance improvements. Like what, what, uh, issues have you have, uh, come up?[00:33:06] Mikhail Parakhin: This is really cool. It's not a solution to all the world's problems for sure. The limitations are usually the ones I-- And this is where we get into a bit of a subjective territory.Uh, I can only share what I've, I've seen so far, and I'm sure the situation, uh, is changing, and, you know, maybe after I say it, like many people will reach out and say, “Hey, what about this?” And you don't know that, and then, then we'll be probably right. But what I've seen is auto research is very good at doing kind of obvious things that you don't have bandwidth to do or you didn't notice or maybe you're not aware of like the-- some standard practices.It is not good at doing something completely out of distribution, something that, you know, you have to think for, for multiple days, uh, and, and do something like none of this. So, so it's, uh, I, uh, set an experiment once, uh, on, on my sort of, uh, hobby thing, and I let it run for, uh, ended up, uh, several weeks run, uh, you know, it's like full production kind of scale, so it, you know, slow runs and, and it ex-- it performed in the end, uh, over four hundred experiments, and only one was successful.I'm like, “Okay, that's, that's good.” But-[00:34:18] swyx: But it saved time.[00:34:19] Mikhail Parakhin: Yeah, I saved time. Like it, it was the, that thing. Yeah, if I, if I were doing four hundred experiments myself, my betting average, as I said, would have been much higher, I'm sure. But also, first of all, it would take me like three years to do four hundred experiments.And, uh, I didn't have to do them. Like the machines were just, uh, the price of electricity did that. So, and I got one improvement, uh, that in, uh, my, my-- Honestly, when I was starting that experiment, my thinking was to go and show that, “Hey, Andre, maybe you just don't know how to optimize.” And I was super smart because in, in my pro-problem, it was optimized for many years, and it was like fully improved.Uh, and I didn't expect it, you know, auto research to find anything at all. Yet it did. So instead of making fun of Andre, I ended up, uh, a big, big supporter. Yeah, that's exactly the tweet. Yes.[00:35:10] swyx: You and Toby really, really go back and forth on-online a lot, which is really funny. Uh, think of it as, as an eval for the optimalness of the code it's running on.Uh, it's almost like it reminds me of like a Kolmogorov complexity thing, but, uh, I guess it's-- there's some optimal thing that you're trying to sort of reduce down to, I guess. Um, and so, so you, you, you know, you should congratulate yourself that you had, uh, you know, uh, ninety-nine percent, uh, optimality.[00:35:36] Mikhail Parakhin: Exactly, yeah. I think Andre really deserves a lot of credit for popularizing this approach. This is, uh, this is incredibly, I think, powerful and cool and You know, the, uh, even him, him just mentioning it led to a lot of gains in a lot of places in the industry, so we should be thankful.[00:35:56] swyx: Yeah. I think he also has a just...I don't know what it is. Like, um, you know, it, it is a simple self-contained project that people can take and apply to other things, which is, is, is one thing, but also just the name. Just like somehow no one, no one managed to call their thing auto research. It's just naming things is very important. I think that that is mostly, uh, our coverage of Tango and, and, uh, Tangents.I think obviously, you know, there's a lot of, uh, ML infra at, at Shopify that people can, uh, dive into. We're about to go into SimGym, but before I do that, any, any other sort of broader comments around this whole effort? Like where is it, where is it leading to?[00:36:36] Mikhail Parakhin: As a segue to SimGym, like all those things start composing strongly.And, uh, you could see a huge unlock when you can look at each one of the tools and, and you see, oh, they're extremely useful. Uh, Tango is useful by itself. Auto Research is useful by itself. SimGym is useful by itself. If you combine all three, you create like synergetic effect. I think that's why we wanted to even, uh, cover them today is because this is something that if you go back even, you know, five years ago, would've been unthinkable.Uh, replicating that, uh, would, would be either incredibly costly or impossible, right? With probably thousands of people are required.[00:37:20] swyx: Well, we have serverless human, uh, serverless intelligence, right? Like, uh, so yes, you do have thousands of hu-- of, of intelligences, not just, not humans. And that's, that's close enough, right?Even if they're not AGI, they're, they're close enough to do the, the task that you need them to do. And, and, you know, that's, there's plenty for, for a lot of routine work, knowledge work. Okay, let's get into SimGym. Um, this is one of those things I, I was surprised to see actually it's apparently your, uh, one of your most popular launches, and I think something that, uh, I think Sim AI, I think Yunjun Park, who did the Smallville thing, there's a very small cottage industry of people trying to do like the simulate customer thing.I think a lot of people maybe don't super trust this yet because they're like, well, obviously they would just do what you prompt them to do, right? But maybe just think, uh, tell us about the sort of inspiration or origin story.[00:38:10] Mikhail Parakhin: That's exactly actually the thing I wanted to cover, because if you don't have the historical data, all you can do is prompt a-agents in a vacuum, and they will do exactly what you prompt them to do.In fact, when I first proposed it, and this is a bit of, um, my brainchild initially, if I, I can boast, even Toby said like, “But wouldn't they, they just repeat what, what you tell them?” And, uh, but I'm like, “Yes, except Shopify has decades of history of how people made changes and what there is, uh, there, what it resulted in terms of sales.”So now what we can do is we can-- we have this... It's not, it's a noisy data. There's a small, usually websites, uh, you know, like things, things are never in isolation. It's almost never AB experiment. It's always AA experiment when there's has two meanings, but basically, you know, in different time you run two different things.But if you aggregate in general, uh, like everything together, and you apply, uh, denoising and collaborative filtering like approach, you can extract a very clear signal. And then you can optimize your agents. And that's why it took so long. It took almost a year of that optimization of just us sitting and fiddling, and, and we had this internal goals of correlation of hitting-- internal goal was to hit zero point seven correlation with, uh, add to cart events, for example.Like that, that if we run real AB test experiment, that it should, it should go and, and rep-uh, replicate, uh, same sort of success that, that humans had or lack thereof. And it, it took forever, and I don't think that's easily replicatable because, uh, like who else would have that data? You have to have this historic, you know, decades, uh, worth of data.And now, now the, like the other thing you need is in-infrastructure and the scale, right? Because, uh, w- again, what we found, uh, stat sig results, you need to run a lot of simulations, a lot of agents, and, and it's-- Those are expensive things. Like you're, you're making actions in the browser because you want a real friction.You want to, to be able to get the image like of what humans will see because you wanna, uh, detect effects like, “Hey, if I make my images larger, will I have more sales or l- uh, fewer sales?” And like usually people's intuition here, by the way, is that I increase my images, I will have more because they look nicer.You know, designers all look sparse and big images. Like usually your sales tank, right? But, but, uh, you know, from HTML, all the characters look the same only the, the size tag looks different, right? So it's very hard. So you have to take visual information, you have to run this in simulated browser environment on the big farm and, and of course, you have to have, uh, like very, very expensive model, good model with multi-model model.So all this it's-- is what's taken so long and, uh, to share my personal fail a little bit there, Sean, is like, you know, we always had this bias to-- for like large company bias. You know, we always, uh, whenever you-- we do, we're like, “Hey, we'll run an experiment,” right? We make, make a change, and we will run an experiment and then, uh, see, uh, see which one's better or like, “No, this is worse,” and most of them are worse, so you discard it and keep iterating, hill climbing.And we're like, “Oh, like smaller merchants, they cannot get stat sig results. They cannot really run experiments simply because, you know, in a week there would be not enough data for them.” So we thought from this perspective. What we didn't realize is that most people don't have A and B, they just have one thing, and they need suggestions of What A and B should be.So, uh, we first build this, hey, we run simulation on two separate teams and, and, uh, say, “Hey, which one is better?” We then morphed it into, and very recently just released it, when you have just your site, your theme, we run over it and we say, “Hey, here's what predicted values of, of, uh, uh, conversions are, and here's how we think you should modify it to increase your conversions.”And then circling back to what you started with, the proof is in the pudding. Like, if we are not correlating with reality, like, people will not be using it. And, uh, thankfully, we see literally every day more users than the previous day. So, so right now, uh, right now- It's working. Yeah. I'm-- Right now my problem is how to pay for it all because the so our major thing is how to optimize the LLMs, do distillation, how to run the headless browsers, uh, and handful browsers, uh, uh, cheaper so that we can accommodate the increase in traffic.[00:42:47] swyx: Yeah. I, I understand that you, uh, you published a lot of technical detail at GTC, so I was just gonna bring it up a little bit. I think s- was this in, in con-conjunction with some kind of GTC presentation? Or something like that, right?[00:42:59] Mikhail Parakhin: Well, we, yeah, we, we did it in several place, but yeah, we had the engineering- Yeahblog, uh, as well. Yeah.[00:43:05] swyx: Yeah. So you're running, uh, GPT OSS. Uh,[00:43:08] Mikhail Parakhin: the, this is an older version. You know, now we run multimodal model. But yeah- Yeah ... GPT OSS, we still run GPT OSS as well for[00:43:15] swyx: And then you have the VMs, and you also have browser-based. I really like this one where it you said, “It violates almost every assumption that standard LLM serving is designed for.”And then you had like, basically orders of magnitude differences between everything.[00:43:29] Mikhail Parakhin: Exactly. Which is, which, uh, which was, you know, a bit of a challenge to implement, like when, like even simple things. Uh, be- since it violates all the assumptions, for example, multi-instance GPUs, like MIGs don't work as well.But we needed, uh, to get MIG to work because, ‘cause otherwise it's way too expensive. And so we had to deal with the, yeah, with, uh, lots of infrastructure and, and, uh, work with, uh, uh, Fireworks and CentML, uh, you know, to help with optimizations and browser-based, as you mentioned. Yeah, like, takes a village.[00:44:04] swyx: Okay. So there's a lot of like, I guess, experimentation in the infrastructure so far, and you've published more or less what you have here. I guess I'm, I'm less familiar with CentML. I, I don't do, uh, that much work in this, this part of the stack. But why was it the sort of preferred instance platform?[00:44:22] Mikhail Parakhin: There are really three probably top companies. There used to be, uh, uh- Three top companies, uh, at least I was aware of that did, uh, LM optimization. You know, together Fireworks and Santa ML, not necessarily in that order. Santa ML recently got acquired by NVIDIA. Uh, what they did is if you have a model and you want to optimize it to a specific prof-- uh, profile of usage, uh, they would go and do it.And, uh, we work with, with those companies, uh, this was work particularly in with Santa ML and NVIDIA to get them the best possible results out of it. And, and sometimes you, you have to retune depending on, like sometimes you want the maximum throughput, sometimes you want minimal latency, sometimes you want like the cheapest, right?And, yeah, or some combination. And so yeah, these are people who would come and help you.[00:45:14] swyx: I see. I see. Yeah, yeah. I'm familiar with these people for the LLM, you know, autoregressive stack. But the other interesting category of these optimizers is also the diffusion people, whereas like Fel and, you know, uh, Pruna recently has come up a lot as well, which I think is like really underappreciated, uh, at least by myself, because I, I thought, oh, all the workload would be LLMs, but actually there's a lot of diffusion as well.[00:45:38] Mikhail Parakhin: Exactly.[00:45:38] swyx: There's a lot here, so I, I, I... it's, it's, uh, it's, it's, it's hard to cover. But I, I do think like people underappreciate the importance of customer simulation, basically. I think this is something that I'm candidly still getting to terms with. Uh, you know, uh, you also-- your team also like prepared this, like, really nice diagram.Uh, I, I assume this is AI generated.[00:46:00] Mikhail Parakhin: Yeah, it looks-[00:46:01] swyx: Maybe it's not.[00:46:01] Mikhail Parakhin: Yeah, it looks, uh, Gemini-ish. Yeah, but, uh, uh, honestly, I, I don't know where, where the hell they generated. It looks, look, uh, looks like it's, uh, Google. But the interesting part, John, that, that, uh, we haven't covered, but I, I wanted to mention is if your store had previous customers, rather than it's a new store, you're like new merchant just launching things, it helps tremendously in just correlation and forecast.Yeah, we take your previous, uh, customer's behavior, and we create agents that replicate those specific distribution of, of customers that you get, and then we a- we apply those to your changes, and then that, that raised raw, you know, the re-- uh, just correlation with the add to cart events or to-- with conversion or whatever it, it, it may be, uh, quite dramatically.So, uh, replicating humans in general seems like an interesting, cool challenge.[00:46:58] swyx: As a shareholder, I think this is the-- like if people are Shopify shareholders, they should really deeply understand this because this is basically the moat. The, the more you use Shopify, the more it will just automatically improve, right?Like you're, you're doing the job for them.[00:47:13] Mikhail Parakhin: Yeah, that's what we started with. Like, uh- ... uh, otherwise, if you're just a startup, I wouldn't do it if, uh, you know, if it was my startup because Without the data, it, yeah, as, as you said, it's, it's exactly the case that, uh, whatever you say in prompt, that's, that's what the agents will be doing.[00:47:30] swyx: The statistician in me wants to like really satisfy the sort of, um, statistical intuition, I guess. Um, to me it's kind of, uh, the, the word that comes to mind is, um, ergodicity. Uh, so let's say a, a customer takes this path, customer takes this path, customer takes this path, right? Um, the... In my mind, the way I explain it is like, okay, here, here's the ninety-five percentile, here's the five percentile, and here's the median, right?Um, but to me, what SimGym is potentially doing is that it can, uh, modify... It can sort of model the sort of in-between sort of journeys as well, that, that maybe are dependent on the previous states. This may be like a very RL-type conclusion where like basically the summary statistics, if you only did naive AB testing, you only have the, the statistics at, at, at a certain point, and you only judge based on the sort of overall summary statistics.But here you can actually model trajectories. Does that make sense? Or-[00:48:31] Mikhail Parakhin: That makes total sense because like, well, that, that makes even more sense that maybe even you realize bec- because-[00:48:38] swyx: Okay. Please,[00:48:38] Mikhail Parakhin: please. Yes ... we do-- Yeah. The, so internally, uh, we have this system, we talked about it briefly once at NeurIPS.We have a huge HSTU-based system that models the whole companies, uh, and their possible paths. And like- Yeah ... what you are, what you are showing, like actually at any point of time, you can either model the user's behavior or you mo- can also think about, uh, the whole merchant as a company, as the entity that acts in the world.You can model that as well. And then you can do, can do counterfactuals. In your graph, like in your blue graph, uh, if you're... Imagine in the center there, uh, somewhere in the middle, you would have an intervention. I give that person a coupon, or I don't know, I send a personal thank you card, or give a discount in some- somewhere.And then you can, uh, then you can do forward rollouts from that counterfactual. So what would have happened with that intervention or without the intervention? And you can even ch- change where that intervention, uh, in time can happen, right? Like some- where, where in this journey. So we, we do this at the Shopify scale for our merchants, and then if we notice that something that they can be fixing, like there's a strong counterfactual, like we have Shopify policy, they basically get a notification like, “Hey, we think your...something is wrong with your-” I don't know, Canadian sales. Like, uh, it looks like it's misconfigured. Here's what you need to do. Or do you think like, uh, you have to set up this campaign with these parameters? And we do that at the buyer level to literally offer discounts or cashback or, or things to buyers.So this is-- I'm getting very excited. Like this is my sort of area of, uh, interest, I guess, and, and hobby. But being able to m-model something complex as human beings or companies and model counterfactuals on it, where you can have interventions in the future and optimize when to make intervention, what kind inter-- uh, what kind of intervention to make.It's such an unlock that previously was completely impossible. Like the-- it was, it was always dreamed of, but never... Like how would you even simulate it without LLMs or HTUs? I think very, very exciting times.[00:50:59] swyx: I just wanted to, uh, to maybe illustrate this. I, I'm not the best illustrator, but I, I am a conceptual statistics guy.And y-you know, you cannot just do this. Like this is a dimensionality AB test doesn't do, right? Like, uh, because it doesn't have the, the, the change over time, uh, stochastic nature, uh, and it doesn't have the sort of contextual like... Here's all the context to this point. Um, okay, cool. Um, that's SimGym.You're, you're gonna burn a lot of tokens on this thing. But you're, you're one of the, the only scale platforms in the world that can, uh, that can do this across a huge variety of workloads, right? I'm even curious on a sort of human, uh, research level of like, well, do, does retail behave d-differently from like clothing sales?D-does that behave differently from electronic sales? I, I don't know. I don't know what else you guys... The Kardashian shoppers, do they differ from like people who buy, uh, I don't know, cars and, uh, whatever.[00:51:55] Mikhail Parakhin: Well, very different, and different sensitivities and different modes of, uh, shopping and, and different levels of what's important.Now, to-totally, you can do aggregations at, uh, at a store level. You can do aggregations at a different, uh, category level. I don't know if, uh, you know, for our statisticians among us, I couldn't believe, but we-- recently we're looking at it, and we had to bring back, uh, CRPs, you know, Chinese restaurant process.It's a, like, way of aggregating and, like, naturally grow clustering. So across... Specifically to answer questions that, uh, like you were just posing on how, how if, if buyers behave different categories. And I'm like, “I haven't seen CRP since two thousand and one.” It's[00:52:37] swyx: so What? It's so- What is... No, I haven't, I haven't seen this.No. This is not in my training. Uh,[00:52:44] Mikhail Parakhin: but, but yeah, it, uh, uh, it actually, like the, the-- there was a very popular kind of theory, popular neurips HTML circles in early two thousands, uh, kind of nice. And now, now it has practical applications, uh- Yeah ... that we were resurrecting.[00:53:03] swyx: Yeah, amazing. Uh, I, I can see, I can see how this is like a, uh, a fun job for you where you get to apply all these things.Um, yeah, yeah, so super cool. Super cool. So, okay, so, so anyone who, who knows what CRPs are and has always wanted to use them at work, uh, they should, they should definitely join Shopify. Okay, so w-we have a lot and but I, I'm, I'm being mindful of the time. I, I do wanted to, to sort of cover some other things.Um, I-I'll give you a choice, UCP or Liquid?[00:53:30] Mikhail Parakhin: Liquid. I think, I think on UCP, you know, like UCP is very important for us and, and it just we are-- UCP, we have a structured, uh, discussions, and you can read about them, and we have, uh, blog posts, and we have a big release this week, in fact, like with our catalog.Oh,[00:53:46] swyx: okay.[00:53:46] Mikhail Parakhin: Uh, yeah,[00:53:46] swyx: but- Le-I mean, we, we can, we can discuss the, the, the release briefly because we'll release this after the-- after it's already announced so whatever. There's a catalog that you guys are doing?[00:53:55] Mikhail Parakhin: Yeah. So we are, we are- Okay ... we are bringing in capabilities of a whole, uh, Shopify catalog.Basically, you now you can search for products, you can do lookups by specific ID, you can do bulk lookups when you need to bring m-multiple products. You don't need to know in ad-in advance what you're trying to show or to sell or check out. Like, you can now, you can now have this decided at, at runtime, and this big area for investment for us for both non-personalized and personalized searches, trying to provide basically a win-window into whole universe of products that are being sold everywhere in the world.And Shopify is really not exactly, but almost like a super set of any-anything being sold. Now we are bringing it into UCP and, uh, and, uh, identity linking is another big thing for us, uh, so that you, you can use, uh, like Google or whatever, whatever identity you have, uh, they're minimizing friction.[00:54:56] swyx: Yeah. So[00:54:57] Mikhail Parakhin: yeah, big release for us.But Liquid AI of course we never talk about, and the problem might be more, more aligned with what we d-discussed previously on this chat.[00:55:07] swyx: Sure. The main thing that everyone understands about Liquid is that it is inspired by Worm, and I still don't know why. I'm curious on your explanation. I think you, you, uh, you can make things very approachable.And also I think like what is the potential of like the, the level of efficiency that you get out of Liquid?[00:55:23] Mikhail Parakhin: You- we all familiar with transformer architectures. And, uh, for the longest time, there was a competing architecture, it's called the state space models. So, so Sams, uh, you know, Chris, Chris Reyes, one of the pioneers and, and lots of startups, uh, trying to make those realities.They have, uh, significant benefits being main being, uh, being much faster and, uh, lower footprint and not quadratic in length, you know, sort of, uh, linear in, in, uh, in your context length. But with state space models- They never quite made it. Like they're used-- They have, uh, certain niches when they thrive, their hybrid architectures are useful, but they never quite made it.And liquid neural networks are, you can think of them as a next step, like, uh, sort of, uh, state-space model square. It's non-transformer architecture that's more complicated than sta-state space and really difficult to code if you-- if I'm being honest. But it's, um, very efficient. It's, uh, subline-- sub, uh, quadratic in, in length of your context.Uh, it's very compact way to represent things, and that's a liquid AI company. They... Their goal is to productize it, and very often you have this need, uh, when you need to have long context and small model, and you want to have low latency. Like in general, it's basically on par with transformers, and if you do hybrids with transformers, it's, it's even better.That's why we at Shopify, when we tried multiple and we constantly try multiple models, multiple companies, we found that for small, particularly with low latency applications, when you have low latency and/or if you need longer context lengths, liquid was the best. And so we still use the whole zoo and always like obviously test and use everything, uh, every open source model and, you know, it feels l
Reformed Brotherhood | Sound Doctrine, Systematic Theology, and Brotherly Love
In this powerful episode of The Reformed Brotherhood, Tony and Jesse return to their parable series with an in-depth examination of the Laborers in the Vineyard from Matthew 20:1-16. This often-misunderstood parable confronts our natural inclination toward merit-based thinking and exposes the scandal of God's grace. The hosts unpack the covenantal language embedded in the text, particularly the workers' "grumbling"—a loaded term echoing Israel's wilderness rebellion. Through careful exegesis and theological reflection, they demonstrate how this parable dismantles religious entitlement while celebrating God's sovereign freedom to bestow mercy according to His purposes, not our calculations. The discussion offers fresh insights into grace, election, and the radical generosity that defines God's kingdom economy. Key Takeaways The parable operates on covenant logic, not economic fairness: The landowner's dealings with his workers reflect covenantal promise-keeping rather than marketplace transactions, establishing that God's relationship with His people is fundamentally gracious. "Grumbling" carries profound theological weight: The Greek word used for the workers' complaint is the same term in the Septuagint for Israel's wilderness rebellion—not mere dissatisfaction, but a covenantal accusation against God's faithfulness. Two types of workers represent two approaches to God: The first-hired workers who contracted for specific wages represent those relating to God through legal obligation and merit, while later workers who trusted the owner's promise represent faith-based relationship. The reversal of payment order is narratively essential: By paying the last workers first, the landowner deliberately exposes the merit-based assumptions of the first workers, forcing them to confront their entitlement. Grace doesn't negate justice—it transcends it: The landowner fulfills every contractual obligation while simultaneously exercising sovereign generosity beyond what is owed, demonstrating that mercy and justice coexist in God's character. The parable addresses the present kingdom, not just heaven: Because it includes grumbling and complaint, this parable describes life in God's kingdom now—the "already but not yet"—rather than the consummated state. Divine sovereignty in salvation is the theological climax: The landowner's declaration "Am I not allowed to do what I choose with what belongs to me?" directly addresses God's freedom in election and the scandal of unmerited grace. Key Ideas The Covenantal Nature of the Landowner's Dealings The parable's opening establishes a formal agreement between the landowner and the first workers: one denarius for a day's labor. This contractual arrangement is crucial for understanding what follows. Unlike marketplace haggling, this represents a covenant—the landowner binds himself to provide what he has promised. Tony emphasizes that even this initial contract is an act of condescension and grace, as the master had no obligation to employ anyone at all. As the day progresses, subsequent workers are hired with increasingly less formal agreements. By the third hour, the landowner promises only "whatever is right," and by the eleventh hour, no wage is even mentioned. These later workers enter the vineyard based entirely on the landowner's character and trustworthiness. This progression mirrors the movement from law to gospel—from contractual obligation to trusting promise. The theological implication is profound: those who relate to God based on His gracious word rather than calculated merit are actually in a more secure position than those who attempt to earn their standing through works. The Wilderness Echo: Grumbling as Covenant Violation The hosts make a critical exegetical observation about the Greek word for "grumbling" (γογγύζω) used in verse 11. This is not casual complaining but the identical term used throughout the Septuagint to describe Israel's covenant rebellion in the wilderness. When the workers grumble "upon receiving" their wages, they're not merely expressing disappointment about pay inequality—they're filing a covenant lawsuit against the master, accusing him of unfaithfulness. This connection to Numbers 16 and Exodus 16-17 is devastating. The Israelites' wilderness grumbling wasn't about logistics or comfort; it was fundamentally about doubting God's covenant fidelity. By employing this loaded terminology, Matthew signals that the first workers' complaint is nothing less than accusing God of covenant violation. The landowner's response ("Friend, I am doing you no wrong. Did you not agree with me for a denarius?") is a covenant defense—he has fulfilled his obligations precisely. The workers' real offense is not miscalculation but begrudging God's freedom to show mercy beyond what is contractually required. The "Evil Eye" and Begrudging God's Grace The final rhetorical question—"Or do you begrudge my generosity?"—contains another Jewish idiom often lost in translation. The Greek literally reads, "Is your eye evil because I am good?" This "evil eye" imagery appears throughout Scripture as a metaphor for envy, stinginess, and resentment toward another's blessing. The landowner's question cuts to the heart: are you cursing me for being generous? This directly parallels Jonah's response to Nineveh's salvation. Jonah had just experienced miraculous deliverance through the great fish, yet when God showed identical mercy to the Ninevites, Jonah's response was essentially, "I knew you were gracious—that's why I ran!" The parable exposes the same perverse logic: those who have received covenant mercy begrudging that same mercy extended to others. For the Pharisees listening to Jesus, this was an indictment of their resentment toward tax collectors and sinners receiving the kingdom. For Christians today, it challenges any sense of spiritual superiority based on how long we've been in the kingdom or how much we've sacrificed. Memorable Quotes Am I not allowed to do what I choose with what belongs to me? Or do you begrudge my generosity? That 'or' is a logical connector—either I'm not allowed to do what I want with my belongings, which is ridiculous, or if I am allowed, then you must be mad at me for being generous. Those are the only options. — Tony Arsenal The grumbling in the Old Testament in this context is a covenantal accusation. These workers aren't just complaining about not getting what they thought they would—they're questioning the veracity of the covenant that was made. — Tony Arsenal Most of us are this eleventh-hour call. It's much better to be in the place of that younger brother who comes in and repents than to be the older brother who is stubborn and finds some reason to come before God with self-righteous grievances. — Jesse Schwamb Full Episode Transcript [00:01:05] Jesse Schwamb: Welcome to episode 488 of the Reformer Brotherhood. I'm Jesse [00:01:13] Tony Arsenal: and I am still Tony, and this is the podcast where Tony comes back. Hey brother. [00:01:19] Jesse Schwamb: Hey brother. The band is back together again, man. It's reunited and boy, do you feel it? It feels good, doesn't [00:01:26] Tony Arsenal: it? I do, I do. I'm excited to come back. It was nice to take a break. [00:01:29] Jesse Schwamb: Good. [00:01:29] Tony Arsenal: I, uh, I've been, you know, texted with you a couple times. Just it was, I did my best to sort of not think about the podcast because that's sort of defeats the purpose of taking a break from something if you spend a lot of time thinking about it. Um, so I'm back. I'm refreshed. I'm ready to go. [00:01:44] Break and Work Chaos [00:01:44] Tony Arsenal: I appreciate the listeners' patience. Uh, it's been sort of a weird, crazy busy time at work. Uh, there's a lot going on. I, I lost like. 60% of my staff in the course of like three weeks. And, um, I'm still kind of in the thick of it, but we're coming out of it. So took a little bit of time to just make sure that I was having a, an appropriate space to de-stress from that and take care of my family and attend to worship. And, um, it was really a, a blessing to have that. Uh, sort of sabbatical. Ironically, the sabbatical wars were going on at the same time on Twitter, and Jesse is blissfully unaware of that 'cause he's not involved in in the Twitter. That's true. Um, but yeah, just took a little break and it's kinda like overblown it, to call it a sabbatical. Like this is a podcast, it's a hobby, but, but it was nice to have, uh, a little bit of extra time, you know, couple hours extra week, uh, uh, each week of extra time to just decompress and, uh, play with the kids and spend time with my wife and clean the house a little bit, which was good. [00:02:36] Jesse Schwamb: Yeah, it is always good to have a clean house. You look great. You seem refreshed. The voice sounds good, and I'm like, I don't know, in year seven or eight of my Twitter sabbatical, it's going great so far. I feel like I haven't missed a whole lot. The world still seems wild and I'm sure, or X, right? We gotta go X on this. It's [00:02:53] Tony Arsenal: always Twitter. It's always gonna be Twitter. I don't care what Elon Musk says. [00:02:56] Jesse Schwamb: Yeah, I'm listen. I'm totally fine with that. [00:02:58] Back to Parables [00:02:58] Jesse Schwamb: And I teased this in the last episode, but we can't be stopped. I mean, people should know this by now, we have an inexorable march through the parables of Jesus's true. That will not be stopped. We're always gonna come back until there are no more. And on this episode, we're gonna be hanging out in Matthew 20, talking about laborers in the Kingdom of Heaven. [00:03:17] Tony Arsenal: Yeah. Yeah. I'm stoked. I'm, I'm, I'm excited to get back into it. I'm excited to get back into the word together with everybody. I'm excited to clear whatever that was on in my throat out [00:03:27] Jesse Schwamb: emotion, [00:03:27] Tony Arsenal: live on the air. Uh, but yeah, it'll be good. I'm, I'm stoked. I mean, I love this stuff and it's good to be back. [00:03:32] Jesse Schwamb: Listen, you had the rest. Now let's talk about labor. So speaking of labor, it's, it's time for you to work up here, Tony. Are you affirming with or denying against on this episode? [00:03:42] Tony Arsenal: Uh, I'm affirming something and I'm hopeful, uh, that just a little behind the scenes activity here. Jesse recorded episode 487, like an hour and a half ago. I have not yet listened to it, so I don't know if you did an affirmation and I I did. If you did. I hope it's not the same one. [00:03:58] Jesse Schwamb: I did not. You're [00:03:59] Tony Arsenal: safe. Uh, good. So I'm safe. [00:04:01] Artemis II Hype [00:04:01] Tony Arsenal: So, um, I'm affirming the Artemis two mission. Um, oh, nice. Have you been, I mean, I know you're not on Twitter, but I'm sure there's news elsewhere. Uh, this amazing mission around the moon, um, for astronaut, for astronauts, I think, um, the furthest man space travel, um, since the Apollo program. Um. Pretty intense, pretty amazing pictures, right? The camera technologies amazing. Increased exponentially, uh, since we were there last. Um, this is ostensibly in preparation for an actual moon landing, which who knows when that will be? Um, but as far as I've seen, the mission was a resounding success. There was no right. I think they had, they ran into a few little hiccups early on with some technical things, but nothing crazy. I have not heard. Um, I know they did touch down and they did reentry. Um, I've not heard anything one way or another, but I'm assuming since I have not heard terrible, tragic news that they made it through, did they do the reentry? I'm really, apparently I'm not actually paying as much attention to this as I thought I was. I saw a lot of information about reentry, but I guess, I don't know for sure when that happened or is happening. [00:05:05] Jesse Schwamb: I mean, by this point, when people listen to it, it'll be old news anyway, right? So [00:05:09] Tony Arsenal: For sure. Yeah. And either, either it went terribly wrong and I'm gonna feel awful, or it went fine and I'm gonna feel a little silly for. Throwing a caveat that it went terribly wrong out there. But, um, it's cool. It's, it's amazing. I mean, I, I commented to my wife the other day and she's kinda like, yeah, maybe we should like, spend that money on people who are on the planet. I was like, okay, I can, I can buy that wisdom. But, um, there's something very cool and very Genesis, uh, one, ask Genesis one and two, ask about flying out into space and taking dominion over Yeah, for sure. Over a, a little ball of rock, uh, you know, uh, 25,000 miles away or whatever it is. Um. And, you know, I'm like an engineering nerd. I, I don't know anything about engineering, but I love watching YouTube videos that explain stuff like this. And [00:05:52] Jesse Schwamb: me [00:05:52] Tony Arsenal: too, all of the videos that have cropped up now about free return and how, like they're able to basically like do minimal burn on the thrusters to get into the right trajectory and then just like meet the moon in the place it's gonna be. And then the, you know, the moon's gravity captures it and whips it back around and then shoots it back towards Earth. And for the most part, they're able to do all of that with relatively minor, um, relatively minor energy output because they're just utilizing physics and gravity and math, um, to fly to the moon and come back. Yes. It's pretty crazy amazing. So, yeah. Amazing. And the photos of like the, the sort of like new versions of the Earthrise photos are really, really phenomenal. Um, they're crisp, they're clean, they're obviously like the best, the best actual pho photographic images we've had of the lunar surface. Um. And the, the far side of the lunar surface, which we get all sorts of like telescopic photos and things of this side of the lunar surface because it's tightly locked and is facing us at all times. We don't get a ton of really great photography of the far side of the moon, which is a big part of what this mission was, so, [00:06:56] Jesse Schwamb: right. [00:06:56] Tony Arsenal: Yeah. If you haven't seen the photos, I mean, they're out there, they're amazing. There will be even more available once we get back. You know, they, they're transmitting only the most stellar, amazing ones. Um, and, but they're taking, I'm sure thousands and thousands of photos and, um, so yeah, it's pretty cool. I'm affirming the Artemis two mission. Um. It's just amazing what, what people can do with common grace, you know? That's right. In insight into nature. Um, I don't know anything about the astronauts. I don't know anything about their religious faith or their spiritual life or anything like that. But, um, the people who design this, the people who fly it, they're just tapping into the truth that's present in God's creation. So good on them. Uh, either I'm glad they got home, wish they have a safe home coming, or something along those lines, I guess. I don't know. [00:07:40] Jesse Schwamb: Yeah, you'll be happy to know that NASA is reporting that the four astronauts are an excellent condition after they landed in the Pacific Ocean. So [00:07:47] Tony Arsenal: good. [00:07:47] Jesse Schwamb: All, all appears to be well. And it says they have a giant SD card of pictures that's they've been taking. Yeah. And saving. I'm sure. They were just, they were just too big to send to over wifi. [00:07:58] Tony Arsenal: Yeah. Like massive wideness. Yeah. I mean, I'm sure they have a ton that they didn't send because you know Right. Data rates to the moon are pretty high. Yeah. [00:08:05] Jesse Schwamb: Ex. Yeah. [00:08:05] Tony Arsenal: This economy is crazy. So [00:08:07] Jesse Schwamb: Exactly. In this economy. Really In this economy. Yeah, exactly. [00:08:11] Cosmic Worship Reflections [00:08:11] Jesse Schwamb: I think you're right. This is good. I haven't talked about this at all. It's hard not to get just stoked, even in the amateur way about the science, the technology, the physics of all this stuff, and then even the astronauts just being overwhelmed by what they're seeing. [00:08:24] Tony Arsenal: Mm-hmm. [00:08:25] Jesse Schwamb: It's hard not to get pulled into that and think about the universe that God has created and find that there is something transcendent just, uh, by observing all of these things. Yeah. Like even casually, which I think shows, again, this is literally the, the heavens and the earth crying out for God, showing his immeasurable power and, you know, immortal nature. It's incredible that we can even see and be a part of some of these things. Just wild. [00:08:49] Tony Arsenal: Yeah. Yeah, and I think it's crazy that they can get signals to the moon. I mean, I drive home from Dartmouth College and I go through half of the spot there, and I don't have a cell signal, but we can get images from the moon. Um, so yeah, it's great. It's great. Check it out if you haven't seen it. If you haven't heard about it, I don't know what you're doing. Uh, this is probably the largest major scientific advancement in our generation. Um, in terms of like big scale scientific enterprise projects. There's been a lot of really amazing technology that's been developed. But this is like the first big. Almost like risky kind of scientific, [00:09:30] Jesse Schwamb: right? [00:09:30] Tony Arsenal: I dunno. Gambit or I dunno, gamble that we've done in a long time. Big deal. I mean, big a lot. Deal of things. Deal. Nothing went wrong. Nothing ma major went wrong. Praise God that they all got back to the planet safely. Right. But, um, a lot of things could have gone wrong, uh, and they didn't. So check out the photos, check out the scientific data they're gonna get. I mean, I'm sure they've got all sorts of information about the way the, the, the space ship moved, all of that stuff. It's gonna be really interesting to see kind of how this all comes about. [00:09:56] Jesse Schwamb: Get some worship on, right? Yeah. I mean this is what a one, a thing to be reminded about how big and how glorious God is. [00:10:01] Tony Arsenal: Yeah. [00:10:01] Jesse Schwamb: And, and to realize, like you said, the risks of this exploration. And this is God again, creating all of this outta nothing. Why? Yeah. Just absolutely wild. Incredible. [00:10:12] Tony Arsenal: Yeah. Yeah, for [00:10:12] Jesse Schwamb: sure. Blown away. [00:10:13] Tony Arsenal: Yeah. What about you, Jesse? What do you have for us? [00:10:15] Bayes and Predictability [00:10:15] Jesse Schwamb: I got affirmation. It's equally nerdy, and actually this is as is always the case. This is why one of many reasons I miss you is it, it dovetails so nicely, so I'm affirming with a book. It's called Everything Is Predictable, how Esy and Statistics Explains the World. It's by a guy named Tom Chivers. I know this sounds super nerdy, but hear me out on this because Thomas Bayes, if you don't know this guy is first kind of like a wild and interesting guy, but this whole theory he put forward is super interesting. And this book is not like a mathematics book. It's like reads almost like a statistical thriller, which as it came outta my mouth, realized it was not maybe more ingratiating. I could have chosen better words than statistical thriller. But Thomas Bayes was alive in the 17 hundreds. And what's interesting to me at least about him, is he was an English statistician, who was a Presbyterian minister actually. He was a non-conformist and his, this whole theorem that he developed was actually published after his death. And the non-conformist part is super interesting. It's all in this book, even some of his different theological ideas. But because he was non-conformist, it basically meant like he couldn't learn. He was kicked out of all the English universities. He had to go to Scotland. Even all of that shaped how he came up with this particular theorem. But the gist of it is. Rather than treating like probabilities, as we think about it as this fixed frequency, you know, how many times does this thing occur? He argued and realized that it should represent a degree of belief and then you would update that belief rationally as new evidence comes in. And I know that sounds super quaint, but this is like what machine learning is based on medical diagnosis. A lot of like space travel is based on this in terms of understanding uncertainty and systems spam, all of that stuff. Here's an example, I think Tony, because we are, we have to carry forward with the top 50 medical podcast thing, right? We've got going on here. Lemme just give everybody an example of why you need this and why you automatically think this way. So. Statistics is really important, especially in medical testing. This was really prevalent in during COVID. So there's two ways that you can describe how a medical test performs you. You know this already, Tony, you're an expert. So one would be like sensitivity. So like how AIG [00:12:19] Tony Arsenal: not an expert. [00:12:20] Jesse Schwamb: Oh, you're definitely an expert in testing. Here we go. So one would be like sensitivity. How good is the test at catching people who are sick? So if you're sick, you, you want the test to identify that, that you're sick. That's sensitivity. So a test with a 99% sensitivity is gonna correctly identify 99 out of a hundred people who are truly sick. It always gonna miss one person. It's a false negative. The other half of that coin is something called specificity. So if sensitivity is all about catching the people who are sick, specificity is gonna say, how good is the test at clearing people who are not sick? And so a test with 99% specificity, you might have correctly guessed, is gonna identify or clear 99 out of a hundred healthy people. Now if you have a test. Both of those 99% sensitive and 99% specific, you might be thinking, that is the dream. That's exactly what I want. That that test is gonna be so precise and accurate. How could my intuition fail me? But this is the thing. It actually fails all the time, and here's why. Let's say that. You go out and you screen a group of people, a general population for a rare disease that affects one in a thousand people. One in a thousand people, rare disease. So if you screen 10,000 people from the general population, that means that truly only 10 of them are going to have the actual disease. I'm not gonna do all the math 'cause it'll, oh, this is already making for amazing podcasting. But here's the bottom line. That test, which sounds so good on the face, is going to identify 109 people as truly sick or truly having disease. But the problem is that only 10 of them actually have it. That means that only there's, it only has a success rate of 9%. There's only 9% chance you actually have the disease, but it's falsely identified. The short end of this is Bayes corrects that problem. He fixes it with his theorem so that we get to the right number of people. That's what's called like a base fallacy rate. It's not taking into account that really only 10 people should have this particular disease or this sickness. So I know that's sounds super nerdy, but so much of our lives are based on this. We have a prior belief or a prior set of things that we understand about the world. And then as evidence comes in, we refine that. That sounds so normal and normative, but it's revolutionary in this book actually. Bayes versus what's called like frequentist or frequent, um, probability is like hotly debated. People actually throw down over this theorem. So it's a really fun read. Go check out. Everything is predictable. Al Bayesian statistics explains our world. It really is for everybody. And then you can impress your friends with all the statistical pross you're gonna have when you're done reading it. [00:14:56] Tony Arsenal: Like the medical administrator hat that I can't always take off is like, why would we screen 10,000 people? Are, are they all symptomatic? Are none of them symptomatic? But suppose it doesn't really [00:15:08] Jesse Schwamb: matter for the example. That's a great, so generally what happens here is, let's say it's like some kind of rare form of cancer, unless you use Bayesian statistics, what you'll find is you'll get these false positive rates. So these tests do use Bayesian statistics. It corrects, in other words, for this problem. So there might be a lot of people that are gonna screen for this because if you, you wanna know if you have it, but you don't wanna get it wrong and say that you do. So this ensures his approach ensures that you get it. Right. It's wild. Fascinating stuff. [00:15:34] Tony Arsenal: Yeah, and I would think actually, you know, there's probably, there's other mechanisms as well where they would, where they would sort of screen out. People that shouldn't be tested or help identify false negatives, false positives. Um, but yeah, that's, that's interesting. I probably won't read that book, but it sounds like an interesting read. I just don't have a lot of room on my A TBR shelf. [00:15:55] Jesse Schwamb: Yeah, listen. That, that's fair. [00:15:57] Goodreads DNF Update [00:15:57] Jesse Schwamb: By the way, here's like a, a side affirmation. I think you and I both share speaking like books and cataloging books. If you use Good Reads, good Reads. Right. Finally adding a list of the Do Not Did Not Finish book. That's fantastic. This, this might be an example for some people, so pick it up and even if you don't have a place for it, guess where you can put it on the did not finish list. Yeah. Good Reads. [00:16:16] Tony Arsenal: That's finally, that's one of those like, like why didn't they add that 15 years ago? Kind of an updates and you get the email and they're like, we're so excited to introduce the did Not Finish thing. And we're like, yeah. Like of course. Like, duh. It's likes, like, we're proud to introduce that. Your keypad now has a zero on it. [00:16:36] Jesse Schwamb: Right. So [00:16:37] Tony Arsenal: yeah. I'm, I'm excited about the DNR, um, the DNF, um, I'm so excited. I can't even remember what it's called. Yeah. The shelf. But, uh, very, very useful. The DNR list [00:16:47] Jesse Schwamb: is a diff it is a different list. Speaking of medical things, it's a different [00:16:50] Tony Arsenal: list. Yeah. Yeah, that's definitely a different thing. Usually it's not a list. It's a list of one in most cases. [00:16:56] Jesse Schwamb: Exactly, [00:16:57] Tony Arsenal: yeah. You can't put other people on your [00:17:00] Jesse Schwamb: DNR [00:17:00] Tony Arsenal: This, [00:17:00] Jesse Schwamb: I suppose. Yeah, I should clarify that. You can really, you can only really put yourself, or I suppose somebody for whom you have that kind of authority over on that list, but I was thinking that more from like a medical perspective, that somewhere there would be a database in which there might be a list of DNR. I don't know. [00:17:15] Tony Arsenal: Yeah, maybe. I don't know. I'm not sure. Probably there was at some point, but I think with medical chart technology now, that's probably like a. A moot point. Yeah. They don't need to be able to like cross reference a master list anymore. They just look in the patient's electronic record. We're really like in the weeds here. You can tell it's been a while since I've, I've podcasted. I don't really remember how to do this. [00:17:35] Jesse Schwamb: This is great. [00:17:36] Segue to Matthew 20 [00:17:36] Jesse Schwamb: I think at this point we try to make some kind of awkward segue that is mildly successful. Again, probably has statistically like a 20 to 27% chance of being successful and really hitting the mark. Yeah. So do you have anything that's gonna move us into this? [00:17:49] Tony Arsenal: Yeah, I mean, I feel like you've been podcasting for the last several weeks without me and I've been working hard and now I'm kind of coming in as Johnny come lately and we're gonna get paid the same amount so. Even though you've worked harder for longer and I'm coming in late to the game here. [00:18:03] Jesse Schwamb: Oh man. Ple loved ones. Please tell me you got that. Please tell me you got all of that. That's, that's what you show up for here. Yeah, that was [00:18:10] Tony Arsenal: a deep cut. [00:18:11] Jesse Schwamb: That, that was beautiful. And I think leads us right into Matthew 20. So I think we've got at least 16 verses to get through here. Maybe again, if we're gonna keep a statistical theme here, something about engineering and math, all that stuff, we'll let everybody else pick the over under and whether or not we're gonna get through this and how many verses that's going to be. But at this point, we might as well begin. [00:18:32] Tony Arsenal: Yes. Yeah. [00:18:33] Read the Parable [00:18:33] Tony Arsenal: I'll start by reading. Uh, we're here in Matthew chapter 20, the first 16 versus this is the parable of the laborers in the vineyard and it reads. For the Kingdom of Heaven is like a master of a house who went out early in the morning to hire laborer laborers for his vineyard. After agreeing with the laborers for a denarius a day, he sent them into the vineyard and going out about the third hour, he saw others standing idle in the marketplace. He said to them, you go into the vineyard too, and whatever is right, I will give you. So they went, going out again about the sixth hour and the ninth hour, he did the same. And about the 11th hour, he went out and found others standing. And he said to them, why do you stand here idle all day? They said to him, because no one has hired us. And he said to them, you go into the vineyard too. And when the evening came, the owner of the vineyard said to his foreman, call the laborers and pay them with their wages, beginning with the last up to the first. And when those hired about the 11th hour came, each of them received a denarius. Now, when those hired first came, they thought they would receive more, but each of them also received a denarius. And on receiving it, they grumbled at the master of the house saying, these last worked only one hour and you have made them equal to us who have borne the burden of the day and the scorching heat. And he replied to one of them, friend, I'm doing you no wrong. Did you not agree with me? For a denarius, take what belongs to you and go, I choose to give the last worker as I give to you. Am I not allowed to do what I choose with what belongs to me? Or do you beg, do you begrudge my generosity? So the last will be first and the first will be last. Now I just wanna head this off. I did bite my tongue earlier and I probably am lisping and this is like a running gag. We thought that we'd resolved it. Uh, so if you hear me stumble over my words a little bit, it's just, it's just the struggle bus today. [00:20:24] Jesse Schwamb: Listen, this is the, these are like the real things we have to deal with when the podcasting, like the real threats, the real injuries. I appreciate you like working through it. Like you just get back up and you walk it off with your tongue. [00:20:35] Tony Arsenal: Yeah, my, my, uh, my podcasting hiatus was actually just a recovery of the last time I bit my tongue. I just needed a couple weeks to, no, I'm just kidding. [00:20:43] Jesse Schwamb: Yeah, we didn't wanna say. [00:20:44] Tony Arsenal: Yeah. [00:20:44] Kingdom Fairness and Grumbling [00:20:44] Tony Arsenal: So, Jesse, this is a, this is a parable that follows right on the heels, um, of kind of everything we've been talking about. And I think as we go through these parables and we look at them and we, we sort of pick them up and we look at the different facets of them, we sort of compare them to each other. We kind of, we kind of place them in their context really. They all have basically the same theme, right? Like they're all kind of circulating around these same topics. In this parable, it's circulating around this idea that, um, the, the owner of the vineyard, the master of the vineyard, is allowed to pay the people he employs whatever he wants. And as long as the payment that is due to an individual is received by that individual, then what other people receive and how they receive it and how hard they've worked and how hard they didn't work. That's really not germane to whether or not the, the laborer received a fair wage, uh, in the first place. Right. So we're, we're circling around themes of kind of fairness of, uh, of sort of resentment, I think for resentment at the master's generosity, which has been a big theme in previous ones. So this will be good for us to expand on. There's always little nuggets and kernels of things that are different from other parables, and then it's interesting to always see the ways that they kind of line up and, and tell us similar things. [00:21:57] Jesse Schwamb: And this parable is unique to Matthew. Yeah. And it does function as this exposition or expansion of what Jesus says in chapter 19 where it says, but many who are first will be last. And the last first, which is repeated with this lovely like inverted emphasis in, at the end of this as you just read. So it belongs to this like interesting cluster of teacher teachings on discipleship and reward nature of the kingdom of God. And we've, we've spoken a lot about that. I think I was just reminded of this as you were, you were. Reading this, I feel like I remember this from some teaching, like this parable is kind of like a unique chiasm that's anchored on the landowner, sovereign generosity, which you brought up. And then there's the complaints of the first hired, which is mirrored by the late comers vulnerability. And then the landowners, two speeches which divide everything, kind of provide sandwich and the like, the theological climax. It does start in that really familiar way, which we've gotten accustomed to thinking about that introductory formula of the kingdom of heaven is like, and it signals of course that what follows is not gonna be a lesson in economics, but it's gonna use all this economic language as theological disclosure for how God's kingdom operates. And it starts again, like you said, with this master of the house, which to me seems. Pretty clearly like a, a God figure himself. Yeah. It's, that's kind of like a reoccurring mathian image. I think. So we've got this vineyard, which of course has all this symbolism, steeply rooted in Israel's covenant imagination and evokes God's people and his redemptive labor among them. So, man, now that I'm saying this all loud, is this thing like super pregnant with all kinds of like imagery and meaning? [00:23:27] Tony Arsenal: Yeah. Yeah. And you know, it's, it's always good to remember, although parables have kind of some parables, most parables have sort of distinct discreet, symbolic elements where like, this represents that this represents that almost in an allegorical form. And, and in some cases, like purely in allegorical form, where it's like pilgrim's progress where each, each individual, each entity, each location each represents some sort of symbolic value. But we have to remember that when, when it says the parable of the kingdom of heaven is like the master of the house, it's not just like the master of the house. Yes. Right. It's like this whole scenario. Yes. It's, it's like. Blah, blah, blah, blah, blah. It's like everything that follows, it's like the entire, um, the entire paree here. That's what the Kingdom of Heaven is like. And one of the things that I think is striking about this is the kingdom of heaven is like some people complaining, like the people complaining about, some people are getting the same wage for less work. Um, that is part of what the Kingdom of Heaven is like. So I think we sometimes think of, of. The kingdom of heaven in, um, in the parables, we think of it as though God is just saying, this is what heaven is like. Right? Jesus Just saying like, this is what heaven is like, but the kingdom of heaven, that language is broader than what we normally would say, uh, is. We're thinking of heaven, like in the, the spiritual abode where God lives and the angels live. Um, where, where the departed saints are waiting for the resurrection, the kingdom of heaven is, is also inclusive of the, the sort of like. Time now between the victory of Christ on the cross and the consummation of the kingdom and the last day, the kingdom of heaven is inclusive of that time period too. And so this parable sort of situates us. I think it situates us in that pre consummated state where we're talking about what it's like to be a part of the kingdom of heaven here and now in our fallen state, but still solidly in the kingdom of heaven. 'cause there's not gonna be any complaining or grumbling about God's justice in God's fairness once we're in the final resurrected state. Right? Sure. Nobody's gonna be looking back and be like, yeah, you were way too gracious for that guy. Nobody's gonna be playing the Jonah part when we're all resurrected and we're worshiping for, for all time going forward. So this parable, because there are elements of. Dissatisfaction or elements of grumbling or complaining similar to like the, the parable of the prodigal son. There's this sun figure, the, the older sun figure who like is just a bonehead and doesn't get it. Well, that can't be talking about the people who are in the resurrection kingdom in the final kingdom. It's gotta be talking about people who are still awaiting the resurrection of the body and who are still not yet. Uh, and even in, in that parable, the, the older son doesn't even seem to be a figure who's, who's regener. Maybe he does become regener at some point in the future, but he doesn't seem to be. In, even in God's kingdom, he doesn't seem to be, even among God's people, he's consistently placed outside of the field. You don't even know he exists until Nick halfway through the parable. This is similar in that there are these workers, they're receiving their wages and some of them are, are outwardly dissatisfied and grumbling against the master of the house. Um, so I think if we think about parables as describing heaven rather than the kingdom of heaven, we can lose sight of, of what's actually being said in a lot of them. [00:26:50] Contracts Versus Grace [00:26:50] Jesse Schwamb: Yeah, that's really good stuff because it strikes me that there are like, strangely, two groups here mentioned, I, I find this really kind of fascinating. We, I think we should talk about this, like the first group has like the most formal agreement, it's almost a legal contract, right? Various was like a standard day laborers wage sufficient mostly for subsistence. And so that detail seems theologically loaded to me. These workers relate to the landowner on the basis of a contract and what is owed. And so their claim at the end of the day will be exactly that. They're owed something and they know it, and that sets up Then this contrast with a second group, which is mostly all about grace because by the time we get to that third hour, like. Approximately like 9:00 AM then we're beginning this pattern repeated at the sixth and the ninth hours. And crucially, for those workers who go out, go out and get recruited, there's no wage that's specified for them. Only the promise of like whatever is right. And so they enter the vineyard, not on the basis of a contract, but on the basis of like the owner's word and character. And that seems to be like more of a picture of trust and not, not calculation. Yeah. Separate than like the first group. And that marketplace, idleness, as I read this, doesn't imply like laziness because verse seven clarifies like they just had not been hired. Right? They were overworked, they were unemployed. They were marginalized. So it does set up, like you said, everything you just talked about, about the kind of this, I like that. Like the Jonah, the Jonah whiners or whatever, like yeah, they want to complain about this, right? There are, and there are two, two separate groups that have kind of been brought into the fold, not under different terms or pretenses, but differently. [00:28:17] Tony Arsenal: Yeah. And I think too, bear's saying, um. Although there are elements of parables that are very, very directly applicable. Mm. We shouldn't read this as though every, every specific thing in the parable is not a parable. Right. Right. I think we can look at this and we can go, you know, you can read this in a way where, oh yeah, there's some people actually earn their, earn their wage, they earn ary. Right. It's a fair contract. And they work all day and he says, well, I'm gonna give you what's right, what you, what I owe you. [00:28:45] God Owes Nothing [00:28:45] Tony Arsenal: The reality is God doesn't owe any of us anything. Right? Right. He owes us wrath and judgment and destruction. And so even, even the people who are the hard workers in the kingdom of God don't merit and never could merit, um, to, in a certain sense, in a strict sense and stick with me before you send your, your angry emails in a real strict sense. Even Adam couldn't merit. What was, well, it was guaranteed to him, according to the Covenant of Works, God had to condescend to make the covenant of works in order for Adam to have any sort of fruition of his blessedness. So there there's no natural obligation, strict obligation that God has to reward the work of his creatures because nothing they could do could ever be sufficient enough to obligate him. So the, the obligation of himself, and that's, this is where I do think this is strong, the fact that he obligates himself to these workers to give them their denarius after a hard day's work [00:29:37] Jesse Schwamb: exactly [00:29:37] Tony Arsenal: is itself. A covenantal, um, contractual, yes. But I actually read this as sort of a covenantal thing and the, the strange part is that the people don't recognize the sort of semi gracious covenantal nature of this. Yes. [00:29:50] Grace In The Hiring [00:29:50] Tony Arsenal: I think, um, you know, there have been times when I, where I've been unemployed, um, not for very long. Now, I know some people face unemployment for a lot longer than I ever have, but I know there was times where I was, I was looking for work and someone would say to me like, Hey, you know, my, my, my lawn needs to be mowed. Could you come over and I'll, I'll give you 25 bucks to mow my lawn. It's a small lawn. Um. That's a gracious act in most cases. Right, right. Um, yes, I'm performing a task. Yes, they're paying me, but they didn't have to offer me that work. They didn't have to offer me that job, especially when it's something that like they could have accomplished themselves. They could have just done it themselves. Um, so I think there's an element of that here, that there's, there's a condescension of the master to these workers, to these laborers who are not part of his household. These are not, they're not slaves. These are not people who are part of his household, who are regular employees. These are people that he goes out into the market to, to find and to hire. And as we see some of, some of these mark, like the difference between the ones that are hired and the ones that are not hired until later in the day, the parable's not super clear about what it is. Just that they're not hired, it doesn't say the lazy ones were left there. The ones were exactly, that were ugly or had like limp legs or like just couldn't cut it. It just says like there was some that didn't get hired. Um, so there's a gracious element of this, and that makes the recognition at the end or the lack of recognition at the end by these full day laborers, the, the sort of like recognition, this, this entitled ness, um, that actually makes it all the worst. It's like the people who are outwardly attached to the covenant of grace. Um, I know all the Baptists in our, our group, their heads just exploded, but like are outwardly attached to the covenant of grace, um, who wanna somehow complain about like the graciousness of the covenant of grace that they're outwardly attached to it. It's just sort of like a form of, of theological and temporary insanity, I think. And that's what we see on full display here. [00:31:40] Jesse Schwamb: It's definitely all grace. You're right that nobody's gonna get injustice right in this parable. And I think that's definitely exemplified the further out you go in this hiring order. [00:31:49] Eleventh Hour Mercy [00:31:49] Jesse Schwamb: So by the time you get to 5:00 PM which is pretty extraordinary, right? Only really like one hour remains before sense, right? It's the end of the working day. [00:31:56] Tony Arsenal: Yeah. [00:31:56] Jesse Schwamb: You can imagine like these guys who are being hired at the hour probably can contribute very little in the last hour of the day, right? But this owner goes out and hires them and no agreement is stated whatsoever. It's just pure grace. The landowner's question, why do you stand here idle all day? I think to your point, underlies their vulnerability. They were not idle by choice, presumably. And so I think we rightly here in this, like a foreshadowing of those who are called the late in redemptive history, Gentile sinners, the seemingly least qualified for kingdom membership. All of that I think is at play and it's all, it's getting this lovely setup of all these groups to help us understand what that kingdom is actually like. [00:32:33] Tony Arsenal: Yeah. Yeah. [00:32:35] Reverse Payroll Setup [00:32:35] Tony Arsenal: And then we have this, um, this is where the sort of dramatic tension turns, right? The end of the day comes and, uh, the master calls the, the people that he brought last, right? He calls the people who'd only been there for an hour and he starts to go down the list of the people who, the people who were last, and the people who came in next. And the people who came in next, right? And the workers who had contracted at the beginning of the day. Um, they're watching this happen and they're kind of going, oh, this is gonna be good. Like, that guy's only been here for an hour and he got a denarius. You know, the logic is probably like, I'm gonna get 12 denarius, like I'm gonna go 12 days worth of work. Um, because I think there's an assumption on their part, um, that the master's fair that he is, he's providing an equitable wage. Um, of course the master is fair, but he's providing an equitable wage that's commensurate with the work delivered. A delivered, delivered, right? And that, that's the key to this parable. [00:33:26] Merit Mindset Exposed [00:33:26] Tony Arsenal: I think the expectation that God. Helps those who help themselves. Right? God rewards those who put in the hard work. God. God provides blessing or salvation according to the merit provided by the one who's being saved. That perspective is what's on full display here. Yes. By the people who are, uh, the ones who contracted for the full day. They're not thinking about the covenant that they have with this person or the contract they have with this person. They're not thinking about the fact that they agreed to work for the day in order to earn a day's wage. They're thinking about how this actually is gonna work out great in their favor. They're looking at this as a strictly merit-based kind of a, a thing. And you would think that like when the, the one hour people come in, they get a denarius, and then the three hour people come in and they get a denarius. You'd think they would pick up on it at some point, but then in the course of the payroll, it doesn't seem that they do. They still get to the bottom of the list and think they're gonna get more compared to the other people who all got the same. [00:34:22] Jesse Schwamb: Yeah, that display piece is critical to this. It is like complete setup. Like you can imagine he, the landowner calling everybody together at the end of the day and they're all standing around. Some of them are exhausted because they've again born all their work in the heat of the day on their backs. They're tired, they're dirty, maybe they're exhausted. And he starts in this reverse order. And by the way, we should note that there is something here that's beautiful in that the law, the landowner is law abiding because right evening payment is mandated in the Torah. So we see all this taking place as to fulfill the law in some ways. But the reversal of the order that last of first is like such deliberative and good narrative storytelling and staging, isn't it? 'cause it ensures that the first hired workers are going to witness the payment of those who work the least. And if without that order, if you just did it the other way around, the more a crisis of the parable disc like completely goes away. [00:35:10] Tony Arsenal: Yeah. [00:35:10] Jesse Schwamb: So this execution of the payment at the owner's will, it just shows that he has. He's completely independent. His sovereignty belong. The sovereignty belongs to the master alone. And so this 11th hour workers receiving a full day's wage for one hour of work, that's like an act of sheer generosity. It's not proportional justice. And I think as reform, people, maybe all of us at some point have had this conversation about predestination and justice and mercy. And again, really I think putting a crowbar between this idea that nobody is receiving injustice, but some are receiving mercy and grace. And here these first hired workers seeing this form, like you said, this expectation that they're gonna receive more, like you said, where that came from. Yeah, it's just them, right? It's purely manufactured in their own reasoning. It's not anchored in the covenantal promise and certainly not witnessed in the grace that they should be receive, like perceiving as the payments get doled out, like sequentially moving in their reverse order toward those who have worked the longest. But their expectation reveals that they have fundamentally misread like the landowner's character. They're still operating in the register of a contract and not grace. [00:36:16] Tony Arsenal: Yeah. And you know, I think to sort of lock this covenant covenantal frame and sort of like lack of recognition of the covenant into place too, when you look at the language of this parable, um, and especially kind of what it's following up on, it's coming on the heels of this interaction with this rich, rich young ruler who comes in and he thinks that he's gonna earn eternal life by keeping the commandments. Um, and, and he, he has this outward sense or this outward display of pty. He's calling Jesus good. He's saying he, you know, he keeps the commandments, Jesus doesn't even disagree with him actually, that he has connect. Yes. You know, I think it's implied that, well, of course you haven't, but he, he still is graciously trying to like, convince this guy, no, you actually need to abandon your self righteousness and, and pursue and follow me. Um. But this is a parable where like other people are listening, right? There's other witnesses. This isn't like the rich young ruler came to him in the middle of the night, like Nicodemus. This is something that's happened on PO on in the public. So we can anticipate that the Pharisees and the Sadducees and the scribes and the lawyers were all aware of this. They may have been there, but they were at least aware of this happening. And I think there's some language in here that is actually directed at those people. [00:37:30] Grumbling As Accusation [00:37:30] Tony Arsenal: And, and here's where it comes in, is you get to verse, um, we'll start reading again at verse nine. It says, when those hired about the 11th hour came, each of them received a denarius. Now, when those hired first came, so we're referring to the people who are hired at the beginning of the day. Now, when those who were hired first came, they thought they would receive more, but each of them also received a denarius and on receiving it, right? So this is as, this is, um, uh, just unbelievable as they're receiving the denarius on receiving it, they grumbled at the master of the house. Now, just the way that I read that and said the word grumbled tells you that that word is really important here. Yes. If you look at this Greek word. And you compare it to the, the word, the usage of this word in the, the, um, Sept. Yes. Which of course is the Greek translation of the Old Testament. This word most commonly appears in the wilderness wandering accounts. [00:38:22] Jesse Schwamb: Yes. [00:38:23] Tony Arsenal: Right. And the, the primary sin of the Israelites during the wilderness wandering was grumbling against the Lord. And this grumbling against the Lord in that context is not just a general complaining, right. It's not just like a, a sort of like a, a general dissatisfaction or like murmuring. This isn't like water cooler frustration about your boss. The grumbling in the Old Testament in this context is a covenantal accusation, right. So this is tied to the, the accounts where Moses first is told to strike the rock, and he does so when the water comes out, and then second is told to speak to the rock, but he strikes it. I won't go into all the details, but the scene that's being, being displayed there is the people come, they accuse the Lord of abandoning them into the wilderness. And this scene where Moses is set up on the rock and he strikes the rock, that scene is a judicial scene. The people have filed a covenant accusation against the Lord, and in reality, it's the people who have been unfaithful. But the Lord standing in the place of the rock is the one who is struck, right? Jesus was the rock in the wilderness from which the water came. Paul says that in First Corinthians, right? So this language of grumbling in this is not just, they're not just complaining about the fact that they didn't get what they thought they were going to, they're questioning the veracity of the covenant that was made. So they're, they're still locked into this merit-based. This merit-based idea even more than it seemed at first, right? There's a logic to the idea that like, oh, if the, the master is actually paying a wage of one denarius for per hour, like there's a logic to that. But it's not just that they're saying, and this is, this explains the response of the master. It's not just that they're saying like, Hey, wait a second, like the wage rate that you're paying is not right. They're saying you have violated the terms of our covenant in the way that you have paid us. 'cause it's upon receiving it that they complain or they grumble and the master says more or less like, Hey. You agreed with me for one Denarius, I'm giving you what you've earned. I'm giving you what you agreed on. Why don't you take it and go. So the answer is not to try to justify why he is free to pay these other people more, or why he's free to pay these people a perceived less. The answer is, again, they're complaining against the covenant. He is bringing it back to the covenant saying, well, here's what the covenant relationship was. You work for the day. I give you Denarius. We're square here, we're on the same page. We've fulfilled our covenant obligations, and you've received your reward for that. So I, I think that's another thing we have to lock in here is this is not just a general idea of like unfairness that's being presented. This is not just a general idea that people are saying the master of the house is unfair. They're saying he's covenantal. Unfaithful. Right? That's a pretty big accusation. [00:41:09] Jesse Schwamb: Yeah, that is, thank you by the way, for completely stealing the whole tugen thing from me. Like I was just going hot to Tugen to find that reference. And now all I can do is add to it. So that is from at least one of those occasions, a number 16, and I just wanna read the verse. This is 16 six. So Moses and Aaron said to all the sons of Israel at evening, you will know that Yahweh has brought you outta the land of Egypt. And in the morning you will see the glory of Yahweh for he hears your grumblings against Yahweh. And what we are that you grumble against us. So I'm totally with you. This is not subtle. The workers first complaint here, the first workers' complaint is like theologically serious. Uh, I think that's what you're hitting us on. Like it charges the owner with injustice. Right. And as I read it, the grievance has like two layers or two parts, I would say. One is this comparative part, which is basically saying, you made us equal to them. Right? And the second be like a meritorious part, they have worked harder and in worse conditions. And that's why they say things like, it's, it's all inflammatory language, isn't it? Like the scorching heat emphasizes like the real bodily cost and their complaint. I think if we're honest, it's not irrational, but it's spiritually revealing at least because Right, they believe their greater effort, mayors greater reward and they resent that grace shown to others. So like you said, they're bringing forward a very serious grievance and it's, it's not just like, Hey, we think maybe could you give us a bonus? Right. But that is a matter of faithfulness. And in fact, like as I'm looking at this tugen here, shout out to logos Bible software. And I'm saying that that verb that we're talking about in Exodus 16 is in the imperfect tense. So this is, they kept on grumbling and it is like an an echo of Israel's murmuring in the wilderness, which I presume like Matthew certainly had intentionally used there or had that view in part casting these workers as the same types of those who relate to God through entitlement rather than gratitude. So it's like insults upon insult here, but it is to emphasize this fact that it's no small accusation, it's not subtle, it's meant to be in your face. They're coming in hot with this and they're making a big deal about it. [00:43:16] Tony Arsenal: Yeah, and again, I think like underscoring the covenantal nature of this is so key. And I think, you know, when we look at this, we really have to land that this is not just saying. Your wage structure is not right. 'cause and, and we gotta remember, they weren't there when the master went and made this bargain, or, you know, brought these other workers into the vineyard. They weren't there to hear what covenant or contract he did or didn't make. And as we've commented, they didn't, he didn't even make a covenant with them. He basically just said, I'm gonna put you to work and I'll pay you what's fair. I'll pay you what's right. Um, and they went, okay, you need the work and thank you. Like, I think, I think that's kind of like the, the scene here is they're standing there. They recognize they're not gonna get a wage for the day, especially these ones that he's coming in at the 11th hour, they're not gonna get a wage for the day. And as you said, these are subsistence workers. Right. These are people that if you don't get a wage, and this is the, the grounding of the Old Testament, um, the Old Testament command of, of paying at the end of the day is that if they don't get their wage, they're not gonna eat. They're not gonna have food, they're not gonna have the money they need to survive. Um, so he comes in and he basically says like. You don't have a job that's not gonna be good for you. I'll take care of you. I'll, I'll give you a job and I'll take care of you. And the ones who are complaining and grumbling, they have no line of sight to that process. That, that's right. They make a lot of assumptions about the, and this is, goes back to, um. The parable of the talents, which we haven't really talked about yet. The, the, there's a lot of assumptions about the nature of this master that the, the contracted or covenanted day laborers are making that don't turn out to be accurate. Right. They, they assume that he's working, as you've said, that he's working on this one-to-one, you know, quid pro quo. You do this, I do that kind of a, a methodology and he's actually operating on a basis of a much more. Basic, uh, grace principle. Uh, and again, even, even the principle of hiring these original workers and covenanting with them is gracious in the sense that he didn't have to hire them. Right. So, so all along the way they're, they're, it's like the epitome of looking a gift horse in the mouth. [00:45:24] Jesse Schwamb: Yes. [00:45:24] Tony Arsenal: They've been hired, and so yes, it is right for them to expect their, um, to expect their wage, whatever that wage might be. But they, they are misinterpreting the idea of what the wages are and how the wages are to be delivered. They're, they're applying, this is actually a lot like job's, friends, right? Their, their logic is not actually all that bad, but they have, they have missing parts of the picture that makes the logic. Apply differently in this particular situation. They think that this, this master works on a strict merit-based. You do X amount of work, you receive X amount of money. And this master is actually more functioning on this covenantal principle of, I'm gonna pay you what's right, regardless of what, what work you've done, which, what work is actually owed to you. And the master makes these, this agreement with these other workers to just say, go into the vineyard and then when the evening comes, I'll pay you. Right. Well, he intended to pay them what they needed to survive, regardless of how much work they provided. Right? So they're all, even though there's a formal contract to say these, this group works for the whole day and this group, you know, and, and they receive one day's labor, at the end of the day, he's graciously providing another day of survival for all of these people, for the work that they're, they're putting forward regardless of how much they actually contribute to his bottom line. [00:46:41] Owner Defends The Covenant [00:46:41] Jesse Schwamb: And we see that in verse 13, where the landowner gives his defense, you know, it says. He and he replied, friends, I'm doing you no wrong. Did you not agree with me for Denarius? Now the address, because now I'm deep in the Greek Tony. Here we go. So the address I'm seeing in, uh, again, shout out to Locus Bible software, it, this use of friend is not like the warm fellows, but like a more formal or distance term of address. It's used elsewhere in Matthew. But I think the point here is that the owner's first line of defense is this contractual point, which you're saying. I have not wronged you. He's kept his agreement precisely. No injustice has been done. And that's crucial. The owner doesn't re appreciate justice. He actually fulfills it. He obligates himself and he fulfills that obligation. And what the worker receives is exactly what was promised and exactly what is due. And so by the time he gets to verse 14 where he says, take what belongs to you, and go, I choose to give to this last worker as I give to you here. I think this is like the theological beating hide of this whole bad boy. Yeah. [00:47:37] Jesse Schwamb: The landowner explicitly invokes his will, his sovereign freedom to do and to give as he pleases, which is exactly how God behaves. It's not a negation of justice, but this declaration of something beyond justice, it is grace. He exercises his freedom and generosity to those who had no claim, and the command, take what belongs to you and go is, is kind of like a world dismissal, like, like you were saying. Yeah. We're in the courtroom. He's like, I, I've ruled on this already. Like, bring Brian, bring your grievance. Here's my ruling. Take what you have and go. Their grumbling has revealed that they're not celebrating the kingdom. They're actually grieving it. So yeah, you know, I think original invocation of like Jonah is right on the money. It's basically like, are are you mad enough? Yeah, I'm mad enough to die. Like, how dare you give me, give me this great shade and then take it away from me. Yeah. And in some ways this is even worse because what they have been given has been that were promised to them, was given to them, and they get to retain and God says, go, or the landowner as God says, go now and take what is yours. Take what I've given to you graciously. But your point that like what supersedes that, the antecedent to all of that is still God's covenant keeping, covenant making promise, making, right? That sets the whole thing up. But I love this idea that, you know, I will choose, it's my desire, it's language of divine volition. And of course the reform theology, this single verb resonates with the entire doctrine of election. It's God's free, sovereign, and gracious will to bestow blessing without reference to merit, like praise his name. [00:49:00] Tony Arsenal: Yeah. Yeah. And then we come to kind of the close of this parable, right? And this is, this reall
Today's clip is from Episode 152 of the podcast, featuring Daniel Saunders. In this conversation, Daniel explores how Bayesian decision theory handles real-world risk aversion beyond the textbook maximum expected utility framework.The key insight: classical Bayesian decision theory assumes risk neutrality, but in practice, people and businesses are risk-averse. Using a pricing optimization example, Daniel shows how uncertainty varies dramatically across price points—lower prices have predictable demand, while higher prices create wide uncertainty in profits. This asymmetry matters when you want safer decisions.Daniel introduces exponential utility functions—a technique from economics that models diminishing returns on money. By adjusting a risk-aversion parameter, you can see how increasing risk aversion shifts optimal decisions away from high-uncertainty, high-profit scenarios toward more predictable outcomes.The broader lesson: optimal decision-making requires separating the modeling process from the decision process, allowing you to build in constraints and risk adjustments that pure expected utility maximization would miss.Get the full discussion hereSupport & Resources→ Support the show on Patreon: https://www.patreon.com/c/learnbayesstats→ Bayesian Modeling Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Have you ever parked your car, walked away, and completely blanked on where you left it? And yet… somehow you walked straight to it anyway? When that happens, it might not be luck. Rather, it could be your brain using what scientists now call “fuzzy memory.” And this might be the most important discovery about short-term memory in the last 20 years. To learn more, I sat down with Dr. Paul M. Garrett, currently doing postdoctoral work at the University of Melbourne. In addition to studying how your brain makes decisions when it’s uncertain, his recent article on The Conversation raised old and new questions related to how I think about forgetting, remembering, and every decision I make in between. Here’s one aspect of Paul’s research that blew my mind: The old theory said your brain has a fixed number of memory “slots.” If something made it into a slot, you remembered it. If it didn’t, it was gone. But that theory is apparently wrong. Rather, Paul’s research demonstrates that even memories you’d swear are completely gone still leave a faint signal in your brain, precise enough to push you toward the right answer without you even knowing why. That “gut feeling” you get sometimes? That might literally be a fuzzy memory talking. In this conversation we go deep on why your brain caps out at 3 to 4 items in working memory and what that actually means. https://www.youtube.com/watch?v=wm6C08m3WUI We also discuss why you remember better from a physical book than a screen (and the possible spatial memory scientific explanations behind it). Next, we discuss: How trained motor skills get so deep into procedural memory that even ten years away can’t break their strength How marketers and salespeople exploit your decision boundaries using time pressure Why a tiny dose of Bayesian reasoning would make almost everyone a sharper thinker What new EEG research on voluntary decisions reveals about whether or not free will is real How to make memory science more accessible by finding good science popularizers Paul and I had such a deep conversation that we kept going well past the formal interview. That bonus discussion covers experimental design for mnemonic research, Giordano Bruno‘s 16th century memory seals, the neuroscience of pitch detection and white matter volume, music therapy for Alzheimer’s patients, and a lot more on the topic of how memory works. You can access the full bonus conversation in the Magnetic Memory Method Masterclass. You’ll find it on the Bonus page. More About Paul Garrett & His Research To follow more from Paul, check out his: LinkedIn Profile Google Scholar page Profile on The Conversation If Fuzzy Memory Fascinated You, Go Deeper With These Science-Related Episodes Ready for more? Check out my conversations with: Dr. David Reser and Tyson Yunkaporta about Aboriginal Memory Techniques Dr. Gary Small on science-backed ways to keep memory strong Dr. Christine Till on research into brain training with apps Final Thought Here’s the final thing (for now) that comes to mind about fuzzy memory: If your brain is already doing this much work behind the scenes with zero memory improvement training, just imagine what it can do when you actually give it the right tools. That’s what the Magnetic Memory Method is built for, and that’s why it’s based around the Memory Palace technique. After all, many of us have been using locations we barely remember to memorize tons of information. So if you’re not familiar with this approach, or you’re worried that you can’t remember places you remember enough to use the method of loci, complete this Complete Guide to the Memory Palace technique. You might just be pleasantly surprised by just how much your fuzzy memories help you remember more than you ever imagined possible!
Become a Science of Sport Supporter by making a small monthly pledge. You'll show your support, help us stay "athletic-greens-free", and get access to our world-class discussion forumsIn this Spotlight, we start on the cobbled roads of Belgium to explore why riding on cobbles is so hard, and how not so good vibrations compromise mechanical power, cost more energy and require more exertion to produce the same power output. It's Pogacar vs van der Poel, Round 3 this week on the cobbles of Roubaix, and we wonder whether smart tactics will be enough to overcome the Slovenian's firepower, and whether van der Poel's larger size may tilt the balance in his favour?We discuss Jimmy Gressier's return, in Decathlon's own version of a super-shoe, as he runs an exceptional 5k road time. Speaking of Decathlon, a good week for the brand with Paul Seixas continuing his rise, this time with dominance in the Tour of the Basque Country, and hope for a challenger to Pogacar.A new research paper suggests doping prevalence among University students of 13.7%, but it uses novel statistical methods to get there, after only 3.4% of the athletes admit to PED use. We discuss that study, and what it means for anti-doping knowledge. Less covert (but only a little) about doping are the athletes of the upcoming Enhanced Games, recently valued at $1.2 billion, but now being transparently spoken about as a 'product launch' for longevity and performance enhancement drugs. The recently disclosed peptide stack of one competitor, world's strongest man Mitchell Hooper, is the basis for a chat about the grift those Games.Finally, our teen phenom watch list has two more names, 14-year old girls who broke 23s last week. Ross and Gareth wonder if the gap between adults and children is narrowing, or whether we're just caught in a cycle of noticing more and more such performances.LinksStudy on the effect of vibrations on physiology during cyclingAnother study simulating vibrations, this time showing how much oxygen cost goes upArticle on Gressier, including his struggles with chocolate after his World title last yearWorld Athletics concept on the Marathon as a standalone eventThe Performance Enhancing drug survey that inspired our Bayesian stats discussionZero positives in the 2026 Olympics - the clean games?Mitchell Hooper's peptide stackForbes article on The Enhanced GamesWADA's prohibited list Hosted on Acast. See acast.com/privacy for more information.
Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free): Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: Why is bridging deep learning and probabilistic programming so important?A: Deep learning is extraordinarily good at fitting complex functions, but it throws away uncertainty. Probabilistic programming keeps uncertainty explicit throughout. Combining the two – as in inference compilation – lets you get the expressiveness of neural networks while still doing proper Bayesian inference.Q: What is inference compilation and how does it relate to amortized inference?A: Amortized inference is the general idea of training a model upfront so you don't have to run expensive inference from scratch every single time. Inference compilation is a specific form of amortized inference where a neural network is trained to propose good posterior samples for a given probabilistic program – essentially learning to do inference rather than computing it fresh each query.Q: What is PyProb and what problems does it solve?A: PyProb is a probabilistic programming library designed specifically to support amortized inference workflows. It lets you write probabilistic models in Python and then train inference networks on top of them, making methods like inference compilation practical for real-world simulators and scientific models.Q: What are probabilistic surrogate networks and why do they matter?A: A probabilistic surrogate network is a learned approximation of a complex, expensive simulator that preserves uncertainty. Instead of running a costly simulation thousands of times, you train a surrogate that can answer probabilistic queries much faster – crucial for applications like risk modeling where speed and uncertainty quantification both matter.Chapters:00:00:00 Introduction to Bayesian Inference and Its Barriers00:03:51 Andreas Munch's Journey into Statistics00:10:09 Bridging the Gap: Bayesian Inference in Real-World Applications00:15:56 Deep Learning Meets Probabilistic Programming00:22:05 Understanding Inference Compilation and Amortized Inference00:28:14 Exploring PyProb: A Tool for Amortized Inference00:33:55 Probabilistic Surrogate Networks and Their Applications00:38:10 Building Surrogate Models for Probabilistic Programming00:45:44 The Challenge of Bayesian Inference in Enterprises00:52:57 Communicating Uncertainty to Stakeholders01:01:09 Democratizing Bayesian Inference with Evara01:06:27 Insurance Pricing and Latent Variables01:16:41 Modeling Uncertainty in Predictions01:20:29 Dynamic Inference and Decision-Making01:23:17 Updating Models with Actual Data01:26:11 The Future of Bayesian Sampling in Excel01:31:54 Navigating Business Challenges and Growth01:36:40 Exploring Language Models and Their Applications01:38:35 The Quest for Better Inference Algorithms01:41:01 Dinner with Great Minds: A Thought ExperimentThank you to my Patrons for making this episode possible!
In this episode, Thomas Plümper and Eric Neumayer explore the hidden challenges in modern science, from outright fraud to the subtler practice of “tweaking” data that distorts results. They examine why the self-correcting nature of science often falls short, how incentives and academic pressure drive misconduct, and the double-edged role of AI in both enabling and detecting fraud. The conversation also tackles debates around p-values and statistical reasoning, shares cautionary case studies, and proposes solutions like greater data transparency and stronger verification standards. Chapters00:00 Introduction to Fraud in Research06:21 The Nature of Fraud Detection08:56 Incentives and Motivations for Fraud10:43 Self-Correction in Science12:13 Understanding Statistical Significance13:04 The Role of Replication in Research14:32 Bayesian vs Frequentist Approaches23:09 Understanding Bayesian Statistics and Its Implications26:24 The Humility of Empirical Science27:16 Concrete Examples of Scientific Fraud32:52 Proposed Solutions to Scientific Fraud34:50 The Reality of Scientific Fraud and Human NatureGuest LinksYou can purchase their book here (https://amzn.to/3Ole3lY)Follow Eric Neumayer on LinkedIn - (https://linkedin.com/in/ericneumayer)Follow Breaking Math on Substack (https://breakingmath.substack.com/) Twitter (https://x.com/breakingmathpod) Instagram (https://www.instagram.com/breakingmathmedia/) Bluesky (https://bsky.app/profile/breakingmath.bsky.social) Website (https://www.breakingmath.io/) YouTube (https://www.youtube.com/@BreakingMathPod) Follow Noah on Instagram (https://www.instagram.com/profnoahgian/) Twitter (https://x.com/ProfNoahGian) Bluesky (https://bsky.app/profile/profnoahgian.bsky.social) Follow Autumn on Twitter (https://x.com/1autumn_leaf) Bluesky (https://bsky.app/profile/1autumnleaf.bsky.social) Instagram (https://www.instagram.com/1autumnleaf/) Substack (https://substack.com/@1autumnleaf) email: breakingmathpodcast@gmail.com
Today's clip is from Episode 154 of the podcast, with Thomas Pinder.In this conversation, Thomas Pinder explains how Bayesian methods naturally lend themselves to causal modeling, and why that matters for real-world business decisions. The key insight is that causal questions in industry are rarely black and white: instead of a single treatment effect, you get a full posterior distribution, credible intervals, and the ability to communicate the probability that an effect is positive, which is far more useful to stakeholders than a p-value.Thomas then dives into Bayesian Synthetic Control, a reframing of the classic synthetic control method from a constrained optimization problem into a Bayesian regression problem. Rather than optimizing weights on a simplex, you place a Dirichlet prior on the regression coefficients, which turns out to be not just mathematically elegant but practically richer: you can express prior beliefs about how many control units are informative, set the concentration parameter accordingly, or let a gamma hyperprior on that parameter let the data decide. The result is a more flexible, less fragile counterfactual, implemented cleanly in PyMC or NumPyro.Get the full discussion here Support & Resources→ Support the show on Patreon: https://www.patreon.com/c/learnbayesstats→ Bayesian Modeling Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Louis Papot est de retour pour un épisode hors-série enregistré depuis Bangkok. Ancien rugbyman, coach performance et physique basé à Bali, Louis a un parcours international solide : Londres, Sydney, Bali. On a parlé de tout ça, et aussi de comment s'entraîner pour durer.Au programme :— Bali vs Bangkok : deux visions de la vie d'expat en Asie, les vrais avantages, les limites, les visas, la réalité vs le fantasme— Méthode d'entraînement : rugby, Bayesian, Marchon, sa semaine type actuelle, machines vs poids libres, hypertrophie + performance + conditioning— Mobilité : sa position critique sur les CARs et le FRC, comment il travaille les amplitudes dans ses exercices de force, la progression sur les dips et le squat— Récupération : sauna, bain froid, et peptides TB500 / BPC-157 Un épisode dense, chill, entre deux potes qui parlent vrai.Retrouve Louis sur Instagram : @coach_louis_pap
• Support & get perks!• Bayesian Modeling course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work! Takeaways:Q: Why was GPJax created and how does it benefit researchers?A: GPJax was developed to provide a high-performance, flexible framework for Gaussian processes (GPs) within the JAX ecosystem. It allows researchers to move beyond black-box implementations and easily experiment with custom kernels and model structures while leveraging JAX's automatic differentiation and GPU acceleration.Q: What are the primary advantages of using Gaussian processes for data modeling?A: Gaussian processes are highly effective at modeling complex, nonlinear relationships in data. Unlike many machine learning methods that only provide a point estimate, GPs offer built-in uncertainty quantification, which is essential for understanding the reliability of predictions in research and industry.Q: How does the GPJax and NumPyro integration enhance probabilistic modeling?A: The integration allows users to treat GPJax models as components within a larger NumPyro probabilistic program. This combination enables the use of advanced sampling techniques like NUTS (No-U-Turn Sampler), making it easier to build and fit complex hierarchical models that include Gaussian processes.Q: What are the main challenges when applying Gaussian processes to high-dimensional data?A: High-dimensional data significantly complicates GP modeling due to the curse of dimensionality and the cubic scaling of computational costs. In high dimensions, defining meaningful distance metrics for kernels becomes harder, often requiring specialized techniques like sparse GPs or dimensionality reduction to remain tractable.Full Takeaways at: COMING UP SOONChapters:11:40 What is GPJax and how does it simplify Gaussian Process modeling?15:48 How are Bayesian methods used for experimentation and causal inference in industry?18:40 How do you implement Bayesian Synthetic Control?32:17 What is Bayesian Synthetic Difference-in-Differences?39:44 What are the research applications and supported methods for the GPJax library?45:47 What are the primary software and computational bottlenecks when scaling Gaussian Processes?49:02 What are the real-world industrial applications of Gaussian Process models?54:36 How is Bayesian modeling applied to soccer and sports analytics?58:43 What is the future development roadmap for the GPJax ecosystem?01:05:37 What is Impulso and how does it integrate into a Bayesian modeling workflow?01:13:42 How do you balance Bayesian computational overhead with industrial latency requirements?01:20:26 Why is there optimism that scalable Bayesian methods for causal inference are now within reach?Thank you to my Patrons for making this episode possible!Links from the show at: COMING UP SOON
Guidance Recap Podcast | Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products
Decision making sounds like a slightly academic, niche topic… but in reality, it sits underneath every single thing we do in emergency and pre-hospital care. Every patient contact, every test we order, every treatment we start and every one we choose not to – is a decision made in an environment that is time critical, information-light and full of uncertainty. In this episode we take a step back and look at how we actually make decisions at the front door and on the roadside. We talk about why the importance of the decision really matters, not just whether a diagnosis is possible, but how severe it is, how common it is, and whether finding it will genuinely change what we do for the patient. We explore pre-test probability and prevalence, and why knowing how often a condition really occurs in the group of patients in front of you is one of the most powerful tools in emergency medicine. We then move into testing. What actually counts as a test? It's not just bloods, scans and ECGs. It's how someone looks, how they move, what hurts when you examine them and how the story fits together. From there, we build into likelihood ratios and Bayesian thinking; how a piece of information should genuinely shift your estimate of risk, rather than just making you feel more or less comfortable. We also tackle test and treatment thresholds; the idea that there are times when we should stop chasing a diagnosis, and times when the probability is high enough that we should treat without waiting for more tests. Finally, we bring all of this back to real life, with human factors, competing priorities and the reality that sometimes the technically "correct" decision isn't the best decision in that moment. This one is all about becoming more comfortable with uncertainty and making better decisions because of it. Once again we'd love to hear any thoughts or feedback either on the website or via X @TheResusRoom! Simon, Rob & James
• Support & get perks!• Bayesian Modeling course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work !Takeaways:Q: Is generosity a natural human trait?A: Yes, generosity is hardwired in our brains and is essential for social interaction.Q: Why do people say they care about causes but not act on it?A: There is often a disconnect between stated care for causes and actual action. Understanding the conditions under which generosity aligns with a person's identity is crucial for bridging this gap.Q: How should fundraising efforts be approached?A: Fundraising should primarily focus on belief updating rather than mere persuasion.Q: What are the benefits of being generous?A: Generosity has significant mental and physical health benefits, as the brain's reward systems activate when we give, making us feel good.Q: How do our beliefs relate to our actions?A: Our beliefs about ourselves strongly influence our actions and decisions, including our decision to be generous.Q: Can generosity impact a community?A: Yes, generosity can be a powerful tool for improving community dynamics.Q: How can technology like AI assist institutions with donors?A: AI could help institutions remember donors better, improving the donor-institution relationship.Chapters:00:00 What's the role of Behavioral Science inPhilanthropy19:57 What is The Neuroscience of Generosity?24:40 How can we best understand Donor Decision-Making?32:14 How can we achieve reframe Beliefs and Actions?35:39 What is the role of Identity in Habit Formation?38:06 What is the Generosity Gap in Philanthropy?45:06 How can we reduce Friction in Donation Processes?48:27 What is the role of AI and Trust in Nonprofits?52:11 How can we build Predictive Models for Donor Behavior?55:41 What is the role of Empathy in Sales and Stakeholder Engagement?01:00:46 How can we best align ideas with Stakeholder Beliefs?01:02:06 How can we explore Generosity and Memory?Thank you to my Patrons for making this episode possible!Links from the show:Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026! https://www.fieldofplay.co.uk/Bayesian workflow agent skillNeurogiving, The Science of Donor Decision-MakingCherian's websiteCherian's press kitLBS #89 Unlocking the Science of Exercise, Nutrition & Weight Management, with Eric Trexler
American universities stopped optimizing for students a long time ago. The University of Austin was built as a direct counter to that failure. Carlos Carvalho, its president, brings a statistician's precision to the diagnosis, tracing the causal chain from dropped standards to credential collapse while building an institution with no tuition and no government money, staking its survival entirely on student outcomes 20 years out. The conversation moves from the financial architecture of a university, through a curriculum that starts with Plato before it touches Python, to the deeper question of what a university owes a civilization in the age of AI and whether Austin is the right place to answer it.Agenda0:00 Intro + Three Years In 9:42 The $300M Bet 15:42 The Conglomerate Problem 21:42 Western Canon First 28:42 What AI Changes About Teaching 34:42 The Bastrop Lab 41:42 UATX in the Austin Ecosystem 48:42 Atoms vs Bits in Texas 53:42 American Exceptionalism as Mission 59:42 The Hit Pieces 1:06:42 The UCSD Math Collapse 1:11:42 Grade Inflation as Decay 1:14:42 AI and the Soul ProblemGuest BioCarlos Carvalho is the President of the University of Austin. Prior to taking on this role, he spent 15 years as a professor at the University of Texas at Austin's McCombs School of Business, where he held the La Quinta Centennial Professorship and founded the Salem Center for Policy. A native of Brazil, Dr. Carvalho earned his doctorate in statistics from Duke University and has also taught at the University of Chicago Booth School of Business. His research focuses on Bayesian statistics in complex, high-dimensional problems with applications ranging from economics to genetics to public policy. At UATX, he is leading a bold effort to build a new university that stands for American principles and academic excellence.Guest LinksUniversity of Austin: Website, Substack, Instagram, X, LinkedIn -------------------Austin Next Links: Website, X/Twitter, YouTube, LinkedInEcosystem Metacognition Substack
Today's clip is from Episode 152 of the podcast, with Daniel Saunders. In this conversation, Daniel Saunders explains how to incorporate risk aversion into Bayesian price optimization. The key insight is that uncertainty around expected profit is asymmetric across price points, low prices yield more predictable (if modest) returns, while high prices introduce much wider uncertainty. Rather than simply maximizing expected profit, you can pass profit through an exponential utility function that models diminishing returns, a well-established idea from economics. This adds an adjustable risk aversion parameter to the optimization: as risk aversion increases, the model shifts toward more conservative price recommendations, trading off potentially large but uncertain gains for outcomes with tighter, more reliable distributions.Get the full discussion here• Join this channel to get access to perks:https://www.patreon.com/c/learnbayesstats• Intro to Bayes Course (first 2 lessons free): https://topmate.io/alex_andorra/503302• Advanced Regression Course (first 2 lessons free): https://topmate.io/alex_andorra/1011122Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !
Welcome to Predictable B2B Success. In this episode, Vinay Koshy interviews John Cousins—investor, tech founder, and educator—whose MBA ASAP program has helped over 30,000 students worldwide. Learn how John turns business theory into practical advice for founders at every level. Hear why John created MBA ASAP, how mental models and curiosity drive founder success, and his approach to simplifying business concepts. Get practical tips on financial literacy, pricing, and common pitfalls for entrepreneurs. Want actionable business advice and new ways to think about B2B success? Listen in for practical strategies you can use now. Some topics we explore in this episode include: John Cousins' Career Path – His trajectory from engineering to business, teaching, writing, and investing.Creation and Purpose of MBA ASAP – Addressing the gap between academic business education and real-world practices.Educational Techniques – Making complex business topics simple and actionable through practical examples.Mental Models – Using frameworks for strategic thinking and decision-making in business.AI and Automation – Impact of AI on business operations, vibe coding, and leveraging tech tools.Decision-Making Processes – Heuristics, Bayesian analysis, and strategies for faster, smarter choices.Financial Literacy – Simplifying accounting concepts and why finance matters for founders.Iterative Market Testing – Applying the “ready, fire, aim” philosophy to test product demand via email and feedback.Pricing and Revenue Strategies – Finding optimal pricing, avoiding underpricing, and scaling revenue.Skill Stacking – Building complementary skills like reading, sales, and negotiation to excel in business communication.And much, much more...
• Support & get perks!• Proudly sponsored by PyMC Labs! Get in touch at alex.andorra@pymc-labs.com• Intro to Bayes and Advanced Regression courses (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work !Chapters:00:00 The Importance of Decision-Making in Data Science06:41 From Philosophy to Bayesian Statistics14:57 The Role of Soft Skills in Data Science18:19 Understanding Decision Theory Workflows22:43 Shifting Focus from Accuracy to Business Value26:23 Leveraging PyTensor for Optimization34:27 Applying Optimal Decision-Making in Industry40:06 Understanding Utility Functions in Regulation41:35 Introduction to Obeisance Decision Theory Workflow42:33 Exploring Price Elasticity and Demand45:54 Optimizing Profit through Bayesian Models51:12 Risk Aversion and Utility Functions57:18 Advanced Risk Management Techniques01:01:08 Practical Applications of Bayesian Decision-Making01:06:54 Future Directions in Bayesian Inference01:10:16 The Quest for Better Inference Algorithms01:15:01 Dinner with a Polymath: Herbert SimonThank you to my Patrons for making this episode possible!Links from the show:Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026! https://www.fieldofplay.co.uk/A Bayesian decision theory workflowDaniel's website, LinkedIn and GitHubLBS #124 State Space Models & Structural Time Series, with Jesse GrabowskiLBS #123 BART & The Future of Bayesian Tools, with Osvaldo MartinLBS #74 Optimizing NUTS and Developing the ZeroSumNormal Distribution, with Adrian SeyboldtLBS #76 The Past, Present & Future of Stan, with Bob Carpenter
Editor's note: CuspAI raised a $100m Series A in September and is rumored to have reached a unicorn valuation. They have all-star advisors from Geoff Hinton to Yann Lecun and team of deep domain experts to tackle this next frontier in AI applications.In this episode, Max Welling traces the thread connecting quantum gravity, equivariant neural networks, diffusion models, and climate-focused materials discovery (yes, there is one!!!).We begin with a provocative framing: experiments as computation. Welling describes the idea of a “physics processing unit”—a world in which digital models and physical experiments work together, with nature itself acting as a kind of processor. It's a grounded but ambitious vision of AI for science: not replacing chemists, but accelerating them.Along the way, we discuss:* Why symmetry and equivariance matter in deep learning* The tradeoff between scale and inductive bias* The deep mathematical links between diffusion models and stochastic thermodynamics* Why materials—not software—may be the real bottleneck for AI and the energy transition* What it actually takes to build an AI-driven materials platformMax reflects on moving from curiosity-driven theoretical physics (including work with Gerard ‘t Hooft) toward impact-driven research in climate and energy. The result is a conversation about convergence: physics and machine learning, digital models and laboratory experiments, long-term ambition and incremental progress.Full Video EpisodeTimestamps* 00:00:00 – The Physics Processing Unit (PPU): Nature as the Ultimate Computer* Max introduces the idea of a Physics Processing Unit — using real-world experiments as computation.* 00:00:44 – From Quantum Gravity to AI for Materials* Brandon frames Max's career arc: VAE pioneer → equivariant GNNs → materials startup founder.* 00:01:34 – Curiosity vs Impact: How His Motivation Evolved* Max explains the shift from pure theoretical curiosity to climate-driven impact.* 00:02:43 – Why CaspAI Exists: Technology as Climate Strategy* Politics struggles; technology scales. Why materials innovation became the focus.* 00:03:39 – The Thread: Physics → Symmetry → Machine Learning* How gauge symmetry, group theory, and relativity informed equivariant neural networks.* 00:06:52 – AI for Science Is Exploding (Not Emerging)* The funding surge and why AI-for-Science feels like a new industrial era.* 00:07:53 – Why Now? The Two Catalysts Behind AI for Science* Protein folding, ML force fields, and the tipping point moment.* 00:10:12 – How Engineers Can Enter AI for Science* Practical pathways: curriculum, workshops, cross-disciplinary training.* 00:11:28 – Why Materials Matter More Than Software* The argument that everything—LLMs included—rests on materials innovation.* 00:13:02 – Materials as a Search Engine* The vision: automated exploration of chemical space like querying Google.* 01:14:48 – Inside CuspAI: The Platform Architecture* Generative models + multi-scale digital twin + experiment loop.* 00:21:17 – Automating Chemistry: Human-in-the-Loop First* Start manual → modular tools → agents → increasing autonomy.* 00:25:04 – Moonshots vs Incremental Wins* Balancing lighthouse materials with paid partnerships.* 00:26:22 – Why Breakthroughs Will Still Require Humans* Automation is vertical-specific and iterative.* 00:29:01 – What Is Equivariance (In Plain English)?* Symmetry in neural networks explained with the bottle example.* 00:30:01 – Why Not Just Use Data Augmentation?* The optimization trade-off between inductive bias and data scale.* 00:31:55 – Generative AI Meets Stochastic Thermodynamics* His upcoming book and the unification of diffusion models and physics.* 00:33:44 – When the Book Drops (ICLR?)TranscriptMax: I want to think of it as what I would call a physics processing unit, like a PPU, right? Which is you have digital processing units and then you have physics processing units. So it's basically nature doing computations for you. It's the fastest computer known, as possible even. It's a bit hard to program because you have to do all these experiments. Those are quite bulky, it's like a very large thing you have to do. But in a way it is a computation and that's the way I want to see it. You can do computations in a data center and then you can ask nature to do some computations. Your interface with nature is a bit more complicated. But then these things will have to seamlessly work together to get to a new material that you're interested in.[01:00:44:14 - 01:01:34:08]Brandon: Yeah, it's a pleasure to have Max Woehling as a guest today. Max has done so much over his career that I've been so excited about. If you're in the deep learning community, you probably know Max for his work on variational autocoders, which has literally stood the test of prime or officially stood the test of prime. If you are a scientist, you probably know him for his like, binary work on graph neural networks on equivariance. And if you're a material science, you probably know him about his new startup, CASPAI. Max has a long history doing lots of cool problems. You started in quantum gravity, which is I think very different than all of these other things you worked on. The first question for AI engineers and for scientists, what is the thread in how you think about problems? What is the thread in the type of things which excite you? And how do you decide what is the next big thing you want to work on?[01:01:34:08 - 01:02:41:13]Max: So it has actually evolved a lot. In my young days, let's breathe, I would just follow what I would find super interesting. I have kind of this sensor. I think many people have, but maybe not really sort of use very much, which is like, you get this feeling about getting very excited about some problem. Like it could be, what's inside of a black hole or what's at the boundary of the universe or what are quantum mechanics actually all about. And so I follow that basically throughout my career. But I have to say that as you get older, this changes a little bit in the sense that there's a new dimension coming to it and there's this impact. Going in two-dimensional quantum gravity, you pretty much guaranteed there's going to be no impact on what you do relative, maybe a few papers, but not in this world, this energy scale. As I get closer to retirement, which is fortunately still 10 years away or so, I do want to kind of make a positive impact in the world. And I got pretty worried about climate change.[01:02:43:15 - 01:03:19:11]Max: I think politics seems to have a hard time solving it, especially these days. And so I thought better work on it from the technology side. And that's why we started CaspAI. But there's also a lot of really interesting science problems in material science. And so it's kind of combining both the impact you can make with it as well as the interesting science. So it's sort of these two dimensions, like working on things which you feel there's like, well, there's something very deep going on here. And on the other hand, trying to build tools that can actually make a real impact in the world.[01:03:19:11 - 01:03:39:23]RJ: So the thread that when I look back, look at the different things that you worked out, some of them seem pretty connected, like the physics to equivariance and, yeah, and, uh, gravitational networks, maybe. And that seems to be somewhat related to Casp. Do you have a thread through there?[01:03:39:23 - 01:06:52:16]Max: Yeah. So physics is the thread. So having done, you know, spent a lot of time in theoretical physics, I think there is first very fundamental and exciting questions, like things that haven't actually been figured out in quantum gravity. So that is really the frontier. There's also a lot of mathematical tools that you can use, right? In, for instance, in particle physics, but also in general relativity, sort of symmetry space to play an enormously important role. And this goes all the way to gauge symmetries as well. And so applying these kinds of symmetries to, uh, machine learning was actually, you know, I thought of it as a very deep and interesting mathematical problem. I did this with Taco Cohen and Taco was the main driver behind this, went all the way from just simple, like rotational symmetries all the way to gauge symmetries on spheres and stuff like that. So, and, uh, Maurice Weiler, who's also here, um, when he was a PhD student, he was a very good student with me, you know, he wrote an entire book, which I can really recommend about the role of symmetries in AI and machine learning. So I find this a very deep and interesting problem. So more recently, so I've taken a sort of different path, which is the relationship between diffusion models and that field called stochastic thermodynamics. This is basically the thermodynamics, which is a theory of equilibrium. So but then formulated for out of equilibrium systems. And it turns out that the mathematics that we use for diffusion models, but even for reinforcement learning for Schrodinger bridges for MCMC sampling has the same mathematics as this theoretical, this physical theory of non-equilibrium systems. And that got me very excited. And actually, uh, when I taught a course in, um, Mauschenberg, uh, it is South Africa, close to Cape Town at the African Institute for Mathematical Sciences Ames. And I turned that into a book site. Two years later, the book was finished. I've sent it to the publisher. And this is about the deep relationship between free energy, diffusion models, basically generative AI and stochastic thermodynamics. So it's always some kind of, I don't know, I find physics very deep. I also think a lot about quantum mechanics and it's, it's, it's a completely weird theory that actually nobody really understands. And there's a very interesting story, which is maybe good to tell to connect sort of my PZ back to where I'm now. So I did my PZ with a Nobel Laureate, Gerard the toft. He says the most brilliant man I've ever met. He was never wrong about anything as long as I've seen him. And now he says quantum mechanics is wrong and he has a new theory of quantum mechanics. Nobody understands what he's saying, even though what he's writing down is not mathematically very complex, but he's trying to address this understandability, let's say of quantum mechanics head on. And I find it very courageous and I'm completely fascinated by it. So I'm also trying to think about, okay, can I actually understand quantum mechanics in a more mundane way? So that, you know, without all the weird multiverses and collapses and stuff like that. So the physics is always been the threat and I'm trying to apply the physics to the machine learning to build better algorithms.[01:06:52:16 - 01:07:05:15]Brandon: You are still very involved in understanding and understanding physics and the worlds. Yeah. And just like applications to machine learning or introducing no formalisms. That's really cool.[01:07:05:15 - 01:07:18:02]Max: Yes, I would say I'm not contributing much to physics, but I'm contributing to the interface between physics and science. And that's called AI for science or science or AI is kind of a super, it's actually a new discipline that's emerging.[01:07:18:02 - 01:07:18:19]Speaker 5: Yeah.[01:07:18:19 - 01:07:45:14]Max: And it's not just emerging, it's exploding, I would say. That's the better term because I know you go from investments into like in the hundreds of millions now in the billions. So there's now actually a startup by Jeff Bezos that is at 6.2 billion sheep round. Right. Insane. I guess it's the largest startup ever, I think. And that's in this field, AI for science. It tells you something that we are creating a new bubble here.[01:07:46:15 - 01:07:53:28]Brandon: So why do you think it is? What has changed that has motivated people to start working on AI for science type problems?[01:07:53:28 - 01:08:49:17]Max: So there's two reasons actually. One is that people have been applying sort of the new tools from AI to the sciences, which is quite natural. And there's of course, I think there's two big examples, protein folding is a big one. And the other one is machine learning forest fields or something called machine learning inter-atomic potentials. Both of them have been actually very successful. Both also had something to do with symmetries, which is a little cool. And sort of people in the AI sciences saw an opportunity to apply the tools that they had developed beyond advertised placement, right, or multimedia applications into something that could actually make a very positive impact in society like health, drug development, materials for the energy transition, carbon capture. These are all really cool, impactful applications.[01:08:50:19 - 01:09:42:14]Max: Despite that, the science and the kind of the is also very interesting. I would say the fact that these sort of these two fields are coming together and that we're now at the point that we can actually model these things effectively and move the needle on some of these sort of science sort of methodologies is also a very unique moment, I would say. People recognize that, okay, now we're at the cusp of something new, where it results whether the company is called after. We're at the cusp of something new. And of course that always creates a lot of energy. It's like, okay, there's something, it's like sort of virgin field. It's like nobody's green field. Nobody's been there. I can rush in and I can sort of start harvesting there, right? And I think that's also what's causing a lot of sort of enthusiasm in the fields.[01:09:42:14 - 01:10:12:18]RJ: If you're an AI engineer, basically if the people that listen to this podcast will be in the field, then you maybe don't have a strong science background. How does, but are excited. Most I would say most AI practitioners, BM engineers or scientists would consider themselves scientists and they have some background, a little bit of physics, a little bit of industry college, maybe even graduate school that have been working or are starting out. How does somebody who is not a scientist on a day-to-day basis, how do they get involved?[01:10:12:18 - 01:10:14:28]Max: Well, they can read my book once it's out.[01:10:16:07 - 01:11:05:24]Max: This is basically saying that there is more, we should create curricula that are on this interface. So I'm not sure there is, also we already have some universities actual courses you can take, maybe online courses you can take. These workshops where we are now are actually very good as well. And we should probably have more tutorials before the workshop starts. Actually we've, I've kind of proposed this at some point. It's like maybe first have an hour of a tutorial so that people can get new into the field. There's a lot out there. Most of it is of course inaccessible, but I would say we will create much more books and other contents that is more accessible, including this podcast I would say. So I think it will come. And these days you can watch videos and things. There's a huge amount of content you can go and see.[01:11:05:24 - 01:11:28:28]Brandon: So maybe a follow-up to that. How do people learn and get involved? But why should they get involved? I mean, we have a lot of people who are of our audience will be interested in AI engineering, but they may be looking for bigger impacts in the world. What opportunities does AI for science provide them to make an impact to change the world? That working in this the world of pure bits would not.[01:11:28:28 - 01:11:40:06]Max: So my view is that underlying almost everything is immaterial. So we are focusing a lot on LLMs now, which is kind of the software layer.[01:11:41:06 - 01:11:56:05]Max: I would say if you think very hard, underlying everything is immaterial. So underlying an LLM is a GPU, and underlying a GPU is a wafer on which we will have to deposit materials. Do we want to wait a little bit?[01:12:02:25 - 01:12:11:06]Max: Underlying everything is immaterial. So I was saying, you know, there's the LLM underlying the LLM is a GPU on which it runs. In order to make that GPU,[01:12:12:08 - 01:12:43:20]Max: you have to put materials down on a wafer and sort of shine on it with sort of EUV light in order to etch kind of the structures in. But that's now an actual material problem, because more or less we've reached the limits of scaling things down. And now we are trying to improve further by new materials. So that's a fundamental materials problem. We need to get through the energy transition fast if we don't want to kind of mess up this world. And so there is, for instance, batteries. That's a complete materials problem. There's fuel cells.[01:12:44:23 - 01:13:01:16]Max: There is solar panels. So that they can now make solar panels with new perovskite layers on top of the silicon layers that can capture, you know, theoretically up to 50% of the light, where now we're at, I don't know, maybe 22 or something. So these are huge changes all by material innovation.[01:13:02:21 - 01:13:47:15]Max: And yeah, I think wherever you go, you know, I can probably dig deep enough and then tell you, well, actually, the very foundation of what you're doing is a material problem. And so I think it's just very nice to work on this very, very foundation. And also because I think this is maybe also something that's happening now is we can start to search through this material space. This has never been the case, right? It's like scientists, the normal way of working is you read papers and then you come up with no hypothesis. You do an experiment and you learn, et cetera. So that's a very slow process. Now we can treat this as a search engine. Like we search the internet, we now search the space of all possible molecules, not just the ones that people have made or that they're in the universe, but all of them.[01:13:48:21 - 01:14:42:01]Max: And we can make this kind of fully automated. That's the hope, right? We can just type, it becomes a tool where you type what you want and something starts spinning and some experiments get going. And then, you know, outcome list of materials and then you look at it and say, maybe not. And then you refine your query a little bit. And you kind of do research with this search engine where a huge amount of computation and experimentation is happening, you know, somewhere far away in some lab or some data center or something like this. I find this a very, very promising view of how we can sort of build a much better sort of materials layer underneath almost everything. And also more sustainable materials. Our plastics are polluting the planet. If you come up with a plastic that kind of destroys itself, you know, after, I don't a few weeks, right? And actually becomes a fertilizer. These are things that are not impossible at all. These things can be done, right? And we should do it.[01:14:42:01 - 01:14:47:23]RJ: Can you tell us a little bit just generally about CUSBI and then I have a ton of questions.[01:14:47:23 - 01:14:48:15]Speaker 5: Yeah.[01:14:48:15 - 01:17:49:10]Max: So CUSBI started about 20 months ago and it was because I was worried about I'm still worried about climate change. And so I realized that in order to get, you know, to stay within two degrees, let's say, we would not only have to reduce our emissions to zero by 2050, but then, you know, another half century or even a century of removing carbon dioxide from the atmosphere, not by reducing your emissions, but actually removing it at a rate that's about half the rate that we now emit it. And that is a unsolved problem. But if we don't solve it, two degrees is not going to happen, right? It's going to be much more. And I don't think people quite understand how bad that can be, like four degrees, like very bad. So this technology needs to be developed. And so this was my and my co-founder, Chet Edwards, motivation to start this startup. And also because, you know, we saw the technology was ready, which is also very good. So if you're, you know, the time is right to do it. And yeah, so we now in the meanwhile, we've grown to about 40 people. We've kind of collected 130 million investment into the company, which is for a European company is quite a lot. I would say it's interesting that right after that, you know, other startups got even more. So that's kind of tells you how fast this is growing. But yeah, we are we are now at the we've built the platform, of course, but it's for a series of material classes and it needs to be constantly expanded to new material classes. And it can be more automated because, you know, we know putting LLMs in as the whole thing gets more and more automated. And now we're moving to sort of high throughput experimentation. So connecting the actual platform, which is computational, to the experiments so that you can get also get fast feedback from experiments. And I kind of think of experiments as something you do at the end, although that's what we've been doing so far. I want to think of it as what I would call a sort of a physics processing unit, like a PPU, right, which is you have digital processing units and then you have physics processing units. So it's basically nature doing computations for you. It's the fastest computer known as possible, even. It's a bit hard to program because you have to do all these experiments. Those are quite, quite bulky. It's like a very large thing you have to do. But in a way, it is a computation. And that's the way I want to see it. So I want to you can do computations in a data center and then you can ask nature to do some computations. Your interface with nature is a bit more complicated. But then these things will have to seamlessly work together to get to a new material that you're interested in. And that's the vision we have. We don't say super intelligence because I don't quite know what it means and I don't want to oversell it. But I do want to automate this process and give a very powerful tool in the hands of the chemists and the material scientists.[01:17:49:10 - 01:18:01:02]Brandon: That actually brings up a question I wanted to ask you. First of all, can you talk about your platform to like whatever degree, like explain kind of how it works and like what you your thought processes was in developing it?[01:18:01:02 - 01:20:47:22]Max: Yeah, I think it's been surprisingly, it's not rocket science, I would say. It's not rocket science in the sense of the design and basically the design that, you know, I wrote down at the very beginning. It's still more or less the design, although you add things like I wasn't thinking very much about multi-scale models and as the common are rated that actually multi-scale is very important. And the beginning, I wasn't thinking very much about self-driving labs. But now I think, you know, we are now at the stage we should be adding that. And so there is sort of bits and details that we're adding. But more or less, it's what you see in the slide decks here as well, which is there is a generative component that you have to train to generate candidates. And then there is a digital twin, multi-scale, multi-fidelity digital twin, which you walk through the steps of the ladder, you know, they do the cheap things first, you weed out everything that's obviously unuseful, and then you go to more and more expensive things later. And so you narrow things down to a small number. Those go into an experiment, you know, do the experiment, get feedback, etc. Now, things that also have been more recently added is sort of more agentic sort of parts. You know, we have agents that search the literature and come up with, you know, actually the chemical literature and come up with, you know, chemical suggestions for doing experiments. We have agents which sort of autonomously orchestrate all of the computations and the experiments that need to be done. You know, they're in various stages of maturity and they can be continuously improved, I would say. And so that's basically I don't think that part. There's rocket science, but, you know, the design of that thing is not like surprising. What is it's surprising hard to actually build it. Right. So that's that's the thing that is where the moat is in the data that you can get your hands on and the and actually building the platform. And I would say there's two people in particular I want to call out, which is Felix Hunker, who is actually, you know, building the scientific part of the platform and Sandra de Maria, who is building the sort of the skate that is kind of this the MLOps part of the platform. Yeah. And so and recently we also added sort of Aaron Walsh to our team, who is a very accomplished scientist from Imperial College. We're very happy about that. He's going to be a chief science officer. And we also have a partnerships team that sort of seeks out all the customers because I think this is one thing I find very important. In print, it's so complex to do to actually bring a material to the real world that you must do this, you know, in collaboration with sort of the domain experts, which are the companies typically. So we always we only start to invest in the direction if we find a good industrial partner to go on that journey with us.[01:20:47:22 - 01:20:55:12]Brandon: Makes a lot of sense. Over the evolution of the platform, did you find that you that human intervention, human,[01:20:56:18 - 01:21:17:01]Brandon: I guess you could start out with a pure, you could imagine two directions when you start up making everything purely automatic, automated, agentic, so on. And then later on, you like find that you need to have more human input and feedback different steps. Or maybe did you start out with having human feedback? You have lots of steps and then like kind of, yeah, figure out ways to remove, you know,[01:21:17:01 - 01:22:39:18]Max: that is the second one. So you build tools for you. So it's much more modular than you think. But it's like, we need these tools for this application. We need these tools. So you build all these tools, and then you go through a workflow actually in the beginning just manually. So you put them in a first this tool, then run this to them or this with sithery. So you put them in a workflow and then you figure out, oh, actually, you know, this this porous material that we are trying to make actually collapses if you shake it a bit. Okay, then you add a new tool that says test for stability. Right. Yeah. And so there's more and more tools. And then you build the agent, which could be a Bayesian optimizer, or it could be an actual other them, you know, maybe trained to be a good chemist that will then start to use all these tools in the right way in the right order. Yeah. Right. But in the beginning, it's like you as a chemist are putting the workflow together. And then you think about, okay, how am I going to automate this? Right. For one very easy question you can ask yourself is, you know, every time somebody who is not a super expert in DFT, yeah, and he wants to do a calculation has to go to somebody who knows DFT. And so could you start to automate that away, which is like, okay, make it so user friendly, so that you actually do the right DFT for the right problem and for the right length of time, and you can actually assess whether it's a good outcome, etc. So you start to automate smaller small pieces and bigger pieces, etc. And in the end, the whole thing is automated.[01:22:39:18 - 01:22:53:25]Brandon: So your philosophy is you want to provide a set of specific tools that make it so that the scientists making decisions are better informed and less so trying to create an automated process.[01:22:53:25 - 01:23:22:01]Max: I think it's this is sort of the same where you're saying because, yes, we want to automate, yeah, but we don't see something very soon where the chemists and the domain expert is out of the loop. Yeah, but it but it's a retreat, right? It's like, okay, so first, you need an expert to tell you precisely how to set the parameters of the DFT calculation. Okay, maybe we can take that out. We can maybe automate that, right? And so increasingly, more of these things are going to be removed.[01:23:22:01 - 01:23:22:19]Speaker 5: Yeah.[01:23:22:19 - 01:24:33:25]Max: In the end, the vision is it will be a search engine where you where somebody, a chemist will type things and we'll get candidates, but the chemist will still decide what is a good material and what is not a good material out of that list, right? And so the vision of a completely dark lab, where you can close the door and you just say, just, you know, find something interesting and then it will it will just figure out what's interesting and we'll figure out, you know, it's like, oh, I found this new material to blah, blah, blah, blah, right? That's not the vision I have. He's not for, you know, a long time. So for me, it's really empowering the domain experts that are sitting in the companies and in universities to be much faster in developing their materials. And I should say, it's also good to be a little humble at times, because it is very complicated, you know, to bring it to make it and to bring it into the real world. And there are people that are doing this for the entire lives. Yeah. Right. And it's like, I wonder if they scratch their head and say, well, you know, how are you going to completely automate that away, like in the next five years? I don't think that's going to happen at all.[01:24:35:01 - 01:24:39:24]Max: Yeah. So to me, it's an increasingly powerful tool in the hands of the chemists.[01:24:39:24 - 01:25:04:02]RJ: I have a question. You've talked before about getting people interested based on having, you know, sort of a big breakthrough in materials, incremental change. I'm curious what you think about the platform you have now in are sort of stepping towards and how are you chasing the big change or is this like incremental or is there they're not mutually exclusive, obviously, but what do you think about that?[01:25:04:02 - 01:26:04:27]Max: We follow a mixed strategy. So we are definitely going after a big material. Again, we do this with a partner. I'm not going to disclose precisely what it is, but we have our own kind of long term goal. You could call it lighthouse or, you know, sort of moonshot or whatever, but it is going to be a really impactful material that we want to develop as a proof point that it can be done and that it will make it into the into the real world and that AI was essential in actually making it happen. At the same time, we also are quite happy to work with companies that have more modest goals. Like I would say one is a very deep partnership where you go on a journey with a company and that's a long term commitment together. And the other one is like somebody says, I knew I need a force field. Can you help me train this force field and then maybe analyze this particular problem for me? And I'll pay you a bunch of money for that. And then maybe after that we'll see. And that's fine too. Right. But we prefer, you know, the deep partnerships where we can really change something for the good.[01:26:04:27 - 01:26:22:02]RJ: Yeah. And do you feel like from a platform standpoint you're ready for that or what are the things that and again, not asking you to disclose proprietary secret sauce, but what are the things generally speaking that need to happen from where we are to where to get those big breakthroughs?[01:26:22:02 - 01:28:40:01]Max: What I find interesting about this field is that every time you build something, it's actually immediately useful. Right. And so unlike quantum computing, which or nuclear fusion, so you work for 20, 30, 40 years and nothing, nothing, nothing, nothing. And then it has to happen. Right. And when it happens, it's huge. So it's quite different here because every time you introduce, so you go to a customer and you say, so what do you need? Right. So we work, let's say, on a problem like a water filtration. We want to remove PFAS from water. Right. So we do this with a company, Camira. So they are a deep partner for us. Right. So we on a journey together. I think that the breakthrough will happen with a lot of human in the loop because there is the chemists who have a whole lot more knowledge of their field and it's us who will help them with training, having a new message. And in that kind of interface, these interactions, something beautiful will happen and that will have to happen first before this field will really take off, I think. And so in the sense that it's not a bubble, let's put it that way. So that's people see that as actual real what's happening. So in the beginning, it will be very, you know, with a lot of humans in the loop, I would say, and I would I would hope we will have this new sort of breakthrough material before, you know, everything is completely automated because that will take a while. And also it is very vertical specific. So it's like completely automating something for problem A, you know, you can probably achieve it, but then you'll sort of have to start over again for problem B because, you know, your experimental setup looks very different in the machines that you characterize your materials look very different. Even the models in your platform will have to be retrained and fine tuned to the new class. So every time, you know, you have a lot of learnings to transfer, but also, you know, the problems are actually different. And so, yes, I would want that breakthrough material before it's completely automated, which I think is kind of a long term vision. And I would say every time you move to something new, you'll have to start retraining and humans will have to come in again and say, okay, so what does this problem look like? And now sort of, you know, point the the machine again, you know, in the new direction and then and then use it again.[01:28:40:01 - 01:28:47:17]RJ: For the non-scientists among us, me included a bit of a scientist. There's a lot of terminology. You mentioned DFT,[01:28:49:00 - 01:29:01:11]RJ: you equivariance we've talked about. Can you sort of explain in engineering terms or the level of sophistication and engineering? Well, how what is equivariance?[01:29:01:11 - 01:29:55:01]Max: So equivariance is the infusion of symmetry in neural networks. So if I build a neural network, let's say that needs to recognize this bottle, right, and then I rotate the bottle, it will then actually have to completely start again because it has no idea that the rotated bottle. Well, actually, the input that represents a rotated bottle is actually rotated bottle. It just doesn't understand that. Right. If you build equivariance in basically once you've trained it in one orientation, it will understand it in any other orientation. So that means you need a lot less data to train these models. And these are constraints on the weights of the model. So so basically you have to constrain the way such data to understand it. And you can build it in, you can hard code it in. And yeah, this the symmetry groups can be, you know, translations, rotations, but also permutations. I can graph neural network, their permutations and then physics, of course, as many more of these groups.[01:29:55:01 - 01:30:01:08]RJ: To pray devil's advocate, why not just use data augmentation by your bottle is in all the different orientations?[01:30:01:08 - 01:30:58:23]Max: As an option, it's just not exact. It's like, why would you go through the work of doing all that? Where you would really need an infinite number of augmentations to get it completely right. Where you can also hard code it in. Now, I have to say sometimes actually data augmentation works even better than hard coding the equivariance in. And this is something to do with the fact that if you constrain the optimization, the weights before the optimization starts, the optimization surface or objective becomes more complicated. And so it's harder to find good minima. So there is also a complicated interplay, I think, between the optimization process and these constraints you put in your network. And so, yeah, you'll hear kind of contradicting claims in this field. Like some people and for certain applications, it works just better than not doing it. And sometimes you hear other people, if you have a lot of data and you can do data augmentation, then actually it's easier to optimize them and it actually works better than putting the equivariance in.[01:30:58:23 - 01:31:07:16]Brandon: Do you think there's kind of a bitter lesson for mathematically founded models and strategies for doing deep learning?[01:31:07:16 - 01:31:46:06]Max: Yeah, ultimately it's a trade-off between data and inductive bias. So if your inductive bias is not perfectly correct, you have to be careful because you put a ceiling to what you can do. But if you know the symmetry is there, it's hard to imagine there isn't a way to actually leverage it. But yeah, so there is a bitter lesson. And one of the bitter lessons is you should always make sure your architecture is scale, unless you have a tiny data set, in which case it doesn't matter. But if you, you know, the same bitter lessons or lessons that you can draw in LLM space are eventually going to be true in this space as well, I think.[01:31:47:10 - 01:31:55:01]RJ: Can you talk a little bit about your upcoming book and tell the listeners, like, what's exciting about it? Yeah, I should read it.[01:31:55:01 - 01:33:42:20]Max: So this book is about, it's called Generative AI and Stochastic Thermodynamics. It basically lays bare the fact that the mathematics that goes into both generative AI, which is the technology to generate images and videos, and this field of non-equilibrium statistical mechanics, which are systems of molecules that are just moving around and relaxing to the ground state, or that you can control to have certain, you know, be in a certain state, the mathematics of these two is actually identical. And so that's fascinating. And in fact, what's interesting is that Jeff Hinton and Radford Neal already wrote down the variational free energy for machine learning a long time ago. And there's also Carl Friston's work on free energy principle and active entrance. But now we've related it to this very new field in physics, which is called stochastic thermodynamics or non-equilibrium thermodynamics, which has its own very interesting theorems, like fluctuation theorems, which we don't typically talk about, but we can learn a lot from. And I think it's just it can sort of now start to cross fertilize. When we see that these things are actually the same, we can, like we did for symmetries, we can now look at this new theory that's out there, developed by these very smart physicists, and say, okay, what can we take from here that will make our algorithms better? At the same time, we can use our models to now help the scientists do better science. And so it becomes a beautiful cross-fertilization between these two fields. The book is rather technical, I would say. And it takes all sorts of things that have been done as stochastic thermodynamics, and all sorts of models that have been done in the machine learning literature, and it basically equates them to each other. And I think hopefully that sense of unification will be revealing to people.[01:33:42:20 - 01:33:44:05]RJ: Wait, and when is it out?[01:33:44:05 - 01:33:56:09]Max: Well, it depends on the publisher now. But I hope in April, I'm going to give a keynote at ICLR. And it would be very nice if they have this book in my hand. But you know, it's hard to control these kind of timelines.[01:33:56:09 - 01:33:58:19]RJ: Yeah, I'm looking forward to it. Great.[01:33:58:19 - 01:33:59:25]Max: Thank you very much. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Whether it be in politics, public health, or corporate finance, why are people more likely to interpret facts or data in a way that fits their preconceived notions about the world as opposed to searching for the fundamental truth? A new paper from the Harvard Business School called, Sharing Models to Interpret Data (by Joshua Schwartzstein and Adi Sunderam)studies the propensity for people to adopt interpretations to data based on their community's beliefs, and why this can lead to less accurate conclusions. Hosts and finance professors Jonathan Berk and Jules van Binsbergen are joined by the paper's co-author Adi Sunderam, who is a professor of corporate finance at Harvard Business School, a research associate at the National Bureau of Economic Research, and a co-editor of the Journal of Finance. The conversation covers the complexity of Bayesian updating and how the process is improperly deployed in today's thinking, not only in corporate decision-making but also on a sociological level. They also discuss Sunderam's model for explaining how people interpret data, why people are more likely to fall into group-belief dynamics, and if there are any interventions that would lead to better decision-making. Read Adi Sunderam and Joshua Schwartzstein's paper: Sharing Models to Interpret Data Find All Else Equal on the web: https://lauder.wharton.upenn.edu/allelse/ All Else Equal: Making Better Decisions Podcast is a production of the UPenn Wharton Lauder Institute through University FM. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Epstein proved the unthinkable: some "conspiracy theories" are horrifyingly real. Join the Heretics Community For Bonus Videos: https://andrewgoldheretics.com/ Join world-renowned skeptic Michael Shermer on Heretics for a gripping, evidence-driven conversation that redefines conspiracy theories. From Jeffrey Epstein's elite blackmail network and secret island to the once-banned COVID lab-leak theory, Bill Gates emails, vaccine controversies, moon-landing doubts, and elite power plays, Shermer uses Bayesian reasoning to distinguish real conspiracies from speculation—showing why even die-hard skeptics must update their views when hard evidence emerges. SPONSORS: Organise your life: https://akiflow.pro/Heretics Earn up to 4 per cent on gold, paid in gold: https://www.monetary-metals.com/heretics/ Cut your wireless bill to 15 bucks a month at https://mintmobile.com/heretics He also tackles the decline of religion, Jordan Peterson's secular appeal, transgender ideology as potential social contagion with growing regret lawsuits, objective morality versus cultural trends, immigration politics, and whether true progress is happening or we're just cycling through extremes. Real cases like Epstein remind us: dismissing everything as paranoia can blind us to what's actually true. Epstein proved conspiracies can be real. Michael's links: https://www.skeptic.com https://michaelshermer.com #ConspiracyTheories #Epstein #MichaelShermer Join the 30k heretics on my mailing list: https://andrewgoldheretics.com Check out my new documentary channel: https://youtube.com/@andrewgoldinvestigates Andrew on X: https://twitter.com/andrewgold_ok Insta: https://www.instagram.com/andrewgold_ok Heretics YouTube channel: https://www.youtube.com/@andrewgoldheretics Chapters: 00:00 Welcome & Heretical Mindset 04:55 Bayesian Thinking & Changing Your Mind 09:50 COVID Origins, Lab Leak & Politicized Science 14:40 Vaccines, Boosters & Precautionary Principle Failures 19:25 Epstein Files, Bill Gates Emails & Elite Blackmail Theories 24:30 Why Epstein Was Real – & What It Means for Other Conspiracies 29:45 Moon Landing Hoax Claims Debunked 34:20 Decline of Religion, Jordan Peterson & Secular Morality 39:55 Transgender Surge, Social Contagion & Regret Lawsuits 44:50 Objective Moral Truths vs Cultural Fashion 49:55 Immigration, Empathy & Political Strategy Conspiracies 54:50 Progress, Backsliding & Hanlon's Razor 59:55 A Heretic Michael admires Learn more about your ad choices. Visit megaphone.fm/adchoices
From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation.In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire's core bet: that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. Goodfire's answer is to build a bi-directional interface between humans and models: read what's happening inside, edit it surgically, and eventually use interpretability during training so customization isn't just brute-force guesswork.Mark and Myra walk through what that looks like when you stop treating interpretability like a lab demo and start treating it like infrastructure: lightweight probes that add near-zero latency, token-level safety filters that can run at inference time, and interpretability workflows that survive messy constraints (multilingual inputs, synthetic→real transfer, regulated domains, no access to sensitive data). We also get a live window into what “frontier-scale interp” means operationally (i.e. steering a trillion-parameter model in real time by targeting internal features) plus why the same tooling generalizes cleanly from language models to genomics, medical imaging, and “pixel-space” world models.We discuss:* Myra + Mark's path: Palantir (health systems, forward-deployed engineering) → Goodfire early team; Two Sigma → Head of Product, translating frontier interpretability research into a platform and real-world deployments* What “interpretability” actually means in practice: not just post-hoc poking, but a broader “science of deep learning” approach across the full AI lifecycle (data curation → post-training → internal representations → model design)* Why post-training is the first big wedge: “surgical edits” for unintended behaviors likereward hacking, sycophancy, noise learned during customization plus the dream of targeted unlearning and bias removal without wrecking capabilities* SAEs vs probes in the real world: why SAE feature spaces sometimes underperform classifiers trained on raw activations for downstream detection tasks (hallucination, harmful intent, PII), and what that implies about “clean concept spaces”* Rakuten in production: deploying interpretability-based token-level PII detection at inference time to prevent routing private data to downstream providers plus the gnarly constraints: no training on real customer PII, synthetic→real transfer, English + Japanese, and tokenization quirks* Why interp can be operationally cheaper than LLM-judge guardrails: probes are lightweight, low-latency, and don't require hosting a second large model in the loop* Real-time steering at frontier scale: a demo of steering Kimi K2 (~1T params) live and finding features via SAE pipelines, auto-labeling via LLMs, and toggling a “Gen-Z slang” feature across multiple layers without breaking tool use* Hallucinations as an internal signal: the case that models have latent uncertainty / “user-pleasing” circuitry you can detect and potentially mitigate more directly than black-box methods* Steering vs prompting: the emerging view that activation steering and in-context learning are more closely connected than people think, including work mapping between the two (even for jailbreak-style behaviors)* Interpretability for science: using the same tooling across domains (genomics, medical imaging, materials) to debug spurious correlations and extract new knowledge up to and including early biomarker discovery work with major partners* World models + “pixel-space” interpretability: why vision/video models make concepts easier to see, how that accelerates the feedback loop, and why robotics/world-model partners are especially interesting design partners* The north star: moving from “data in, weights out” to intentional model design where experts can impart goals and constraints directly, not just via reward signals and brute-force post-training—Goodfire AI* Website: https://goodfire.ai* LinkedIn: https://www.linkedin.com/company/goodfire-ai/* X: https://x.com/GoodfireAIMyra Deng* Website: https://myradeng.com/* LinkedIn: https://www.linkedin.com/in/myra-deng/* X: https://x.com/myra_dengMark Bissell* LinkedIn: https://www.linkedin.com/in/mark-bissell/* X: https://x.com/MarkMBissellFull Video EpisodeTimestamps00:00:00 Introduction00:00:05 Introduction to the Latent Space Podcast and Guests from Goodfire00:00:29 What is Goodfire? Mission and Focus on Interpretability00:01:01 Goodfire's Practical Approach to Interpretability00:01:37 Goodfire's Series B Fundraise Announcement00:02:04 Backgrounds of Mark and Myra from Goodfire00:02:51 Team Structure and Roles at Goodfire00:05:13 What is Interpretability? Definitions and Techniques00:05:30 Understanding Errors00:07:29 Post-training vs. Pre-training Interpretability Applications00:08:51 Using Interpretability to Remove Unwanted Behaviors00:10:09 Grokking, Double Descent, and Generalization in Models00:10:15 404 Not Found Explained00:12:06 Subliminal Learning and Hidden Biases in Models00:14:07 How Goodfire Chooses Research Directions and Projects00:15:00 Troubleshooting Errors00:16:04 Limitations of SAEs and Probes in Interpretability00:18:14 Rakuten Case Study: Production Deployment of Interpretability00:20:45 Conclusion00:21:12 Efficiency Benefits of Interpretability Techniques00:21:26 Live Demo: Real-Time Steering in a Trillion Parameter Model00:25:15 How Steering Features are Identified and Labeled00:26:51 Detecting and Mitigating Hallucinations Using Interpretability00:31:20 Equivalence of Activation Steering and Prompting00:34:06 Comparing Steering with Fine-Tuning and LoRA Techniques00:36:04 Model Design and the Future of Intentional AI Development00:38:09 Getting Started in Mechinterp: Resources, Programs, and Open Problems00:40:51 Industry Applications and the Rise of Mechinterp in Practice00:41:39 Interpretability for Code Models and Real-World Usage00:43:07 Making Steering Useful for More Than Stylistic Edits00:46:17 Applying Interpretability to Healthcare and Scientific Discovery00:49:15 Why Interpretability is Crucial in High-Stakes Domains like Healthcare00:52:03 Call for Design Partners Across Domains00:54:18 Interest in World Models and Visual Interpretability00:57:22 Sci-Fi Inspiration: Ted Chiang and Interpretability01:00:14 Interpretability, Safety, and Alignment Perspectives01:04:27 Weak-to-Strong Generalization and Future Alignment Challenges01:05:38 Final Thoughts and Hiring/Collaboration Opportunities at GoodfireTranscriptShawn Wang [00:00:05]: So welcome to the Latent Space pod. We're back in the studio with our special MechInterp co-host, Vibhu. Welcome. Mochi, Mochi's special co-host. And Mochi, the mechanistic interpretability doggo. We have with us Mark and Myra from Goodfire. Welcome. Thanks for having us on. Maybe we can sort of introduce Goodfire and then introduce you guys. How do you introduce Goodfire today?Myra Deng [00:00:29]: Yeah, it's a great question. So Goodfire, we like to say, is an AI research lab that focuses on using interpretability to understand, learn from, and design AI models. And we really believe that interpretability will unlock the new generation, next frontier of safe and powerful AI models. That's our description right now, and I'm excited to dive more into the work we're doing to make that happen.Shawn Wang [00:00:55]: Yeah. And there's always like the official description. Is there an understatement? Is there an unofficial one that sort of resonates more with a different audience?Mark Bissell [00:01:01]: Well, being an AI research lab that's focused on interpretability, there's obviously a lot of people have a lot that they think about when they think of interpretability. And I think we have a pretty broad definition of what that means and the types of places that can be applied. And in particular, applying it in production scenarios, in high stakes industries, and really taking it sort of from the research world into the real world. Which, you know. It's a new field, so that hasn't been done all that much. And we're excited about actually seeing that sort of put into practice.Shawn Wang [00:01:37]: Yeah, I would say it wasn't too long ago that Anthopic was like still putting out like toy models or superposition and that kind of stuff. And I wouldn't have pegged it to be this far along. When you and I talked at NeurIPS, you were talking a little bit about your production use cases and your customers. And then not to bury the lead, today we're also announcing the fundraise, your Series B. $150 million. $150 million at a 1.25B valuation. Congrats, Unicorn.Mark Bissell [00:02:02]: Thank you. Yeah, no, things move fast.Shawn Wang [00:02:04]: We were talking to you in December and already some big updates since then. Let's dive, I guess, into a bit of your backgrounds as well. Mark, you were at Palantir working on health stuff, which is really interesting because the Goodfire has some interesting like health use cases. I don't know how related they are in practice.Mark Bissell [00:02:22]: Yeah, not super related, but I don't know. It was helpful context to know what it's like. Just to work. Just to work with health systems and generally in that domain. Yeah.Shawn Wang [00:02:32]: And Mara, you were at Two Sigma, which actually I was also at Two Sigma back in the day. Wow, nice.Myra Deng [00:02:37]: Did we overlap at all?Shawn Wang [00:02:38]: No, this is when I was briefly a software engineer before I became a sort of developer relations person. And now you're head of product. What are your sort of respective roles, just to introduce people to like what all gets done in Goodfire?Mark Bissell [00:02:51]: Yeah, prior to Goodfire, I was at Palantir for about three years as a forward deployed engineer, now a hot term. Wasn't always that way. And as a technical lead on the health care team and at Goodfire, I'm a member of the technical staff. And honestly, that I think is about as specific as like as as I could describe myself because I've worked on a range of things. And, you know, it's it's a fun time to be at a team that's still reasonably small. I think when I joined one of the first like ten employees, now we're above 40, but still, it looks like there's always a mix of research and engineering and product and all of the above. That needs to get done. And I think everyone across the team is, you know, pretty, pretty switch hitter in the roles they do. So I think you've seen some of the stuff that I worked on related to image models, which was sort of like a research demo. More recently, I've been working on our scientific discovery team with some of our life sciences partners, but then also building out our core platform for more of like flexing some of the kind of MLE and developer skills as well.Shawn Wang [00:03:53]: Very generalist. And you also had like a very like a founding engineer type role.Myra Deng [00:03:58]: Yeah, yeah.Shawn Wang [00:03:59]: So I also started as I still am a member of technical staff, did a wide range of things from the very beginning, including like finding our office space and all of this, which is we both we both visited when you had that open house thing. It was really nice.Myra Deng [00:04:13]: Thank you. Thank you. Yeah. Plug to come visit our office.Shawn Wang [00:04:15]: It looked like it was like 200 people. It has room for 200 people. But you guys are like 10.Myra Deng [00:04:22]: For a while, it was very empty. But yeah, like like Mark, I spend. A lot of my time as as head of product, I think product is a bit of a weird role these days, but a lot of it is thinking about how do we take our frontier research and really apply it to the most important real world problems and how does that then translate into a platform that's repeatable or a product and working across, you know, the engineering and research teams to make that happen and also communicating to the world? Like, what is interpretability? What is it used for? What is it good for? Why is it so important? All of these things are part of my day-to-day as well.Shawn Wang [00:05:01]: I love like what is things because that's a very crisp like starting point for people like coming to a field. They all do a fun thing. Vibhu, why don't you want to try tackling what is interpretability and then they can correct us.Vibhu Sapra [00:05:13]: Okay, great. So I think like one, just to kick off, it's a very interesting role to be head of product, right? Because you guys, at least as a lab, you're more of an applied interp lab, right? Which is pretty different than just normal interp, like a lot of background research. But yeah. You guys actually ship an API to try these things. You have Ember, you have products around it, which not many do. Okay. What is interp? So basically you're trying to have an understanding of what's going on in model, like in the model, in the internal. So different approaches to do that. You can do probing, SAEs, transcoders, all this stuff. But basically you have an, you have a hypothesis. You have something that you want to learn about what's happening in a model internals. And then you're trying to solve that from there. You can do stuff like you can, you know, you can do activation mapping. You can try to do steering. There's a lot of stuff that you can do, but the key question is, you know, from input to output, we want to have a better understanding of what's happening and, you know, how can we, how can we adjust what's happening on the model internals? How'd I do?Mark Bissell [00:06:12]: That was really good. I think that was great. I think it's also a, it's kind of a minefield of a, if you ask 50 people who quote unquote work in interp, like what is interpretability, you'll probably get 50 different answers. And. Yeah. To some extent also like where, where good fire sits in the space. I think that we're an AI research company above all else. And interpretability is a, is a set of methods that we think are really useful and worth kind of specializing in, in order to accomplish the goals we want to accomplish. But I think we also sort of see some of the goals as even more broader as, as almost like the science of deep learning and just taking a not black box approach to kind of any part of the like AI development life cycle, whether that. That means using interp for like data curation while you're training your model or for understanding what happened during post-training or for the, you know, understanding activations and sort of internal representations, what is in there semantically. And then a lot of sort of exciting updates that were, you know, are sort of also part of the, the fundraise around bringing interpretability to training, which I don't think has been done all that much before. A lot of this stuff is sort of post-talk poking at models as opposed to. To actually using this to intentionally design them.Shawn Wang [00:07:29]: Is this post-training or pre-training or is that not a useful.Myra Deng [00:07:33]: Currently focused on post-training, but there's no reason the techniques wouldn't also work in pre-training.Shawn Wang [00:07:38]: Yeah. It seems like it would be more active, applicable post-training because basically I'm thinking like rollouts or like, you know, having different variations of a model that you can tweak with the, with your steering. Yeah.Myra Deng [00:07:50]: And I think in a lot of the news that you've seen in, in, on like Twitter or whatever, you've seen a lot of unintended. Side effects come out of post-training processes, you know, overly sycophantic models or models that exhibit strange reward hacking behavior. I think these are like extreme examples. There's also, you know, very, uh, mundane, more mundane, like enterprise use cases where, you know, they try to customize or post-train a model to do something and it learns some noise or it doesn't appropriately learn the target task. And a big question that we've always had is like, how do you use your understanding of what the model knows and what it's doing to actually guide the learning process?Shawn Wang [00:08:26]: Yeah, I mean, uh, you know, just to anchor this for people, uh, one of the biggest controversies of last year was 4.0 GlazeGate. I've never heard of GlazeGate. I didn't know that was what it was called. The other one, they called it that on the blog post and I was like, well, how did OpenAI call it? Like officially use that term. And I'm like, that's funny, but like, yeah, I guess it's the pitch that if they had worked a good fire, they wouldn't have avoided it. Like, you know what I'm saying?Myra Deng [00:08:51]: I think so. Yeah. Yeah.Mark Bissell [00:08:53]: I think that's certainly one of the use cases. I think. Yeah. Yeah. I think the reason why post-training is a place where this makes a lot of sense is a lot of what we're talking about is surgical edits. You know, you want to be able to have expert feedback, very surgically change how your model is doing, whether that is, you know, removing a certain behavior that it has. So, you know, one of the things that we've been looking at or is, is another like common area where you would want to make a somewhat surgical edit is some of the models that have say political bias. Like you look at Quen or, um, R1 and they have sort of like this CCP bias.Shawn Wang [00:09:27]: Is there a CCP vector?Mark Bissell [00:09:29]: Well, there's, there are certainly internal, yeah. Parts of the representation space where you can sort of see where that lives. Yeah. Um, and you want to kind of, you know, extract that piece out.Shawn Wang [00:09:40]: Well, I always say, you know, whenever you find a vector, a fun exercise is just like, make it very negative to see what the opposite of CCP is.Mark Bissell [00:09:47]: The super America, bald eagles flying everywhere. But yeah. So in general, like lots of post-training tasks where you'd want to be able to, to do that. Whether it's unlearning a certain behavior or, you know, some of the other kind of cases where this comes up is, are you familiar with like the, the grokking behavior? I mean, I know the machine learning term of grokking.Shawn Wang [00:10:09]: Yeah.Mark Bissell [00:10:09]: Sort of this like double descent idea of, of having a model that is able to learn a generalizing, a generalizing solution, as opposed to even if memorization of some task would suffice, you want it to learn the more general way of doing a thing. And so, you know, another. A way that you can think about having surgical access to a model's internals would be learn from this data, but learn in the right way. If there are many possible, you know, ways to, to do that. Can make interp solve the double descent problem?Shawn Wang [00:10:41]: Depends, I guess, on how you. Okay. So I, I, I viewed that double descent as a problem because then you're like, well, if the loss curves level out, then you're done, but maybe you're not done. Right. Right. But like, if you actually can interpret what is a generalizing or what you're doing. What is, what is still changing, even though the loss is not changing, then maybe you, you can actually not view it as a double descent problem. And actually you're just sort of translating the space in which you view loss and like, and then you have a smooth curve. Yeah.Mark Bissell [00:11:11]: I think that's certainly like the domain of, of problems that we're, that we're looking to get.Shawn Wang [00:11:15]: Yeah. To me, like double descent is like the biggest thing to like ML research where like, if you believe in scaling, then you don't need, you need to know where to scale. And. But if you believe in double descent, then you don't, you don't believe in anything where like anything levels off, like.Vibhu Sapra [00:11:30]: I mean, also tendentially there's like, okay, when you talk about the China vector, right. There's the subliminal learning work. It was from the anthropic fellows program where basically you can have hidden biases in a model. And as you distill down or, you know, as you train on distilled data, those biases always show up, even if like you explicitly try to not train on them. So, you know, it's just like another use case of. Okay. If we can interpret what's happening in post-training, you know, can we clear some of this? Can we even determine what's there? Because yeah, it's just like some worrying research that's out there that shows, you know, we really don't know what's going on.Mark Bissell [00:12:06]: That is. Yeah. I think that's the biggest sentiment that we're sort of hoping to tackle. Nobody knows what's going on. Right. Like subliminal learning is just an insane concept when you think about it. Right. Train a model on not even the logits, literally the output text of a bunch of random numbers. And now your model loves owls. And you see behaviors like that, that are just, they defy, they defy intuition. And, and there are mathematical explanations that you can get into, but. I mean.Shawn Wang [00:12:34]: It feels so early days. Objectively, there are a sequence of numbers that are more owl-like than others. There, there should be.Mark Bissell [00:12:40]: According to, according to certain models. Right. It's interesting. I think it only applies to models that were initialized from the same starting Z. Usually, yes.Shawn Wang [00:12:49]: But I mean, I think that's a, that's a cheat code because there's not enough compute. But like if you believe in like platonic representation, like probably it will transfer across different models as well. Oh, you think so?Mark Bissell [00:13:00]: I think of it more as a statistical artifact of models initialized from the same seed sort of. There's something that is like path dependent from that seed that might cause certain overlaps in the latent space and then sort of doing this distillation. Yeah. Like it pushes it towards having certain other tendencies.Vibhu Sapra [00:13:24]: Got it. I think there's like a bunch of these open-ended questions, right? Like you can't train in new stuff during the RL phase, right? RL only reorganizes weights and you can only do stuff that's somewhat there in your base model. You're not learning new stuff. You're just reordering chains and stuff. But okay. My broader question is when you guys work at an interp lab, how do you decide what to work on and what's kind of the thought process? Right. Because we can ramble for hours. Okay. I want to know this. I want to know that. But like, how do you concretely like, you know, what's the workflow? Okay. There's like approaches towards solving a problem, right? I can try prompting. I can look at chain of thought. I can train probes, SAEs. But how do you determine, you know, like, okay, is this going anywhere? Like, do we have set stuff? Just, you know, if you can help me with all that. Yeah.Myra Deng [00:14:07]: It's a really good question. I feel like we've always at the very beginning of the company thought about like, let's go and try to learn what isn't working in machine learning today. Whether that's talking to customers or talking to researchers at other labs, trying to understand both where the frontier is going and where things are really not falling apart today. And then developing a perspective on how we can push the frontier using interpretability methods. And so, you know, even our chief scientist, Tom, spends a lot of time talking to customers and trying to understand what real world problems are and then taking that back and trying to apply the current state of the art to those problems and then seeing where they fall down basically. And then using those failures or those shortcomings to understand what hills to climb when it comes to interpretability research. So like on the fundamental side, for instance, when we have done some work applying SAEs and probes, we've encountered, you know, some shortcomings in SAEs that we found a little bit surprising. And so have gone back to the drawing board and done work on that. And then, you know, we've done some work on better foundational interpreter models. And a lot of our team's research is focused on what is the next evolution beyond SAEs, for instance. And then when it comes to like control and design of models, you know, we tried steering with our first API and realized that it still fell short of black box techniques like prompting or fine tuning. And so went back to the drawing board and we're like, how do we make that not the case and how do we improve it beyond that? And one of our researchers, Ekdeep, who just joined is actually Ekdeep and Atticus are like steering experts and have spent a lot of time trying to figure out like, what is the research that enables us to actually do this in a much more powerful, robust way? So yeah, the answer is like, look at real world problems, try to translate that into a research agenda and then like hill climb on both of those at the same time.Shawn Wang [00:16:04]: Yeah. Mark has the steering CLI demo queued up, which we're going to go into in a sec. But I always want to double click on when you drop hints, like we found some problems with SAEs. Okay. What are they? You know, and then we can go into the demo. Yeah.Myra Deng [00:16:19]: I mean, I'm curious if you have more thoughts here as well, because you've done it in the healthcare domain. But I think like, for instance, when we do things like trying to detect behaviors within models that are harmful or like behaviors that a user might not want to have in their model. So hallucinations, for instance, harmful intent, PII, all of these things. We first tried using SAE probes for a lot of these tasks. So taking the feature activation space from SAEs and then training classifiers on top of that, and then seeing how well we can detect the properties that we might want to detect in model behavior. And we've seen in many cases that probes just trained on raw activations seem to perform better than SAE probes, which is a bit surprising if you think that SAEs are actually also capturing the concepts that you would want to capture cleanly and more surgically. And so that is an interesting observation. I don't think that is like, I'm not down on SAEs at all. I think there are many, many things they're useful for, but we have definitely run into cases where I think the concept space described by SAEs is not as clean and accurate as we would expect it to be for actual like real world downstream performance metrics.Mark Bissell [00:17:34]: Fair enough. Yeah. It's the blessing and the curse of unsupervised methods where you get to peek into the AI's mind. But sometimes you wish that you saw other things when you walked inside there. Although in the PII instance, I think weren't an SAE based approach actually did prove to be the most generalizable?Myra Deng [00:17:53]: It did work well in the case that we published with Rakuten. And I think a lot of the reasons it worked well was because we had a noisier data set. And so actually the blessing of unsupervised learning is that we actually got to get more meaningful, generalizable signal from SAEs when the data was noisy. But in other cases where we've had like good data sets, it hasn't been the case.Shawn Wang [00:18:14]: And just because you named Rakuten and I don't know if we'll get it another chance, like what is the overall, like what is Rakuten's usage or production usage? Yeah.Myra Deng [00:18:25]: So they are using us to essentially guardrail and inference time monitor their language model usage and their agent usage to detect things like PII so that they don't route private user information.Myra Deng [00:18:41]: And so that's, you know, going through all of their user queries every day. And that's something that we deployed with them a few months ago. And now we are actually exploring very early partnerships, not just with Rakuten, but with other people around how we can help with potentially training and customization use cases as well. Yeah.Shawn Wang [00:19:03]: And for those who don't know, like it's Rakuten is like, I think number one or number two e-commerce store in Japan. Yes. Yeah.Mark Bissell [00:19:10]: And I think that use case actually highlights a lot of like what it looks like to deploy things in practice that you don't always think about when you're doing sort of research tasks. So when you think about some of the stuff that came up there that's more complex than your idealized version of a problem, they were encountering things like synthetic to real transfer of methods. So they couldn't train probes, classifiers, things like that on actual customer data of PII. So what they had to do is use synthetic data sets. And then hope that that transfer is out of domain to real data sets. And so we can evaluate performance on the real data sets, but not train on customer PII. So that right off the bat is like a big challenge. You have multilingual requirements. So this needed to work for both English and Japanese text. Japanese text has all sorts of quirks, including tokenization behaviors that caused lots of bugs that caused us to be pulling our hair out. And then also a lot of tasks you'll see. You might make simplifying assumptions if you're sort of treating it as like the easiest version of the problem to just sort of get like general results where maybe you say you're classifying a sentence to say, does this contain PII? But the need that Rakuten had was token level classification so that you could precisely scrub out the PII. So as we learned more about the problem, you're sort of speaking about what that looks like in practice. Yeah. A lot of assumptions end up breaking. And that was just one instance where you. A problem that seems simple right off the bat ends up being more complex as you keep diving into it.Vibhu Sapra [00:20:41]: Excellent. One of the things that's also interesting with Interp is a lot of these methods are very efficient, right? So where you're just looking at a model's internals itself compared to a separate like guardrail, LLM as a judge, a separate model. One, you have to host it. Two, there's like a whole latency. So if you use like a big model, you have a second call. Some of the work around like self detection of hallucination, it's also deployed for efficiency, right? So if you have someone like Rakuten doing it in production live, you know, that's just another thing people should consider.Mark Bissell [00:21:12]: Yeah. And something like a probe is super lightweight. Yeah. It's no extra latency really. Excellent.Shawn Wang [00:21:17]: You have the steering demos lined up. So we were just kind of see what you got. I don't, I don't actually know if this is like the latest, latest or like alpha thing.Mark Bissell [00:21:26]: No, this is a pretty hacky demo from from a presentation that someone else on the team recently gave. So this will give a sense for, for technology. So you can see the steering and action. Honestly, I think the biggest thing that this highlights is that as we've been growing as a company and taking on kind of more and more ambitious versions of interpretability related problems, a lot of that comes to scaling up in various different forms. And so here you're going to see steering on a 1 trillion parameter model. This is Kimi K2. And so it's sort of fun that in addition to the research challenges, there are engineering challenges that we're now tackling. Cause for any of this to be sort of useful in production, you need to be thinking about what it looks like when you're using these methods on frontier models as opposed to sort of like toy kind of model organisms. So yeah, this was thrown together hastily, pretty fragile behind the scenes, but I think it's quite a fun demo. So screen sharing is on. So I've got two terminal sessions pulled up here. On the left is a forked version that we have of the Kimi CLI that we've got running to point at our custom hosted Kimi model. And then on the right is a set up that will allow us to steer on certain concepts. So I should be able to chat with Kimi over here. Tell it hello. This is running locally. So the CLI is running locally, but the Kimi server is running back to the office. Well, hopefully should be, um, that's too much to run on that Mac. Yeah. I think it's, uh, it takes a full, like each 100 node. I think it's like, you can. You can run it on eight GPUs, eight 100. So, so yeah, Kimi's running. We can ask it a prompt. It's got a forked version of our, uh, of the SG line code base that we've been working on. So I'm going to tell it, Hey, this SG line code base is slow. I think there's a bug. Can you try to figure it out? There's a big code base, so it'll, it'll spend some time doing this. And then on the right here, I'm going to initialize in real time. Some steering. Let's see here.Mark Bissell [00:23:33]: searching for any. Bugs. Feature ID 43205.Shawn Wang [00:23:38]: Yeah.Mark Bissell [00:23:38]: 20, 30, 40. So let me, uh, this is basically a feature that we found that inside Kimi seems to cause it to speak in Gen Z slang. And so on the left, it's still sort of thinking normally it might take, I don't know, 15 seconds for this to kick in, but then we're going to start hopefully seeing him do this code base is massive for real. So we're going to start. We're going to start seeing Kimi transition as the steering kicks in from normal Kimi to Gen Z Kimi and both in its chain of thought and its actual outputs.Mark Bissell [00:24:19]: And interestingly, you can see, you know, it's still able to call tools, uh, and stuff. It's um, it's purely sort of it's it's demeanor. And there are other features that we found for interesting things like concision. So that's more of a practical one. You can make it more concise. Um, the types of programs, uh, programming languages that uses, but yeah, as we're seeing it come in. Pretty good. Outputs.Shawn Wang [00:24:43]: Scheduler code is actually wild.Vibhu Sapra [00:24:46]: Yo, this code is actually insane, bro.Vibhu Sapra [00:24:53]: What's the process of training in SAE on this, or, you know, how do you label features? I know you guys put out a pretty cool blog post about, um, finding this like autonomous interp. Um, something. Something about how agents for interp is different than like coding agents. I don't know while this is spewing up, but how, how do we find feature 43, two Oh five. Yeah.Mark Bissell [00:25:15]: So in this case, um, we, our platform that we've been building out for a long time now supports all the sort of classic out of the box interp techniques that you might want to have like SAE training, probing things of that kind, I'd say the techniques for like vanilla SAEs are pretty well established now where. You take your model that you're interpreting, run a whole bunch of data through it, gather activations, and then yeah, pretty straightforward pipeline to train an SAE. There are a lot of different varieties. There's top KSAEs, batch top KSAEs, um, normal ReLU SAEs. And then once you have your sparse features to your point, assigning labels to them to actually understand that this is a gen Z feature, that's actually where a lot of the kind of magic happens. Yeah. And the most basic standard technique is look at all of your d input data set examples that cause this feature to fire most highly. And then you can usually pick out a pattern. So for this feature, If I've run a diverse enough data set through my model feature 43, two Oh five. Probably tends to fire on all the tokens that sounds like gen Z slang. You know, that's the, that's the time of year to be like, Oh, I'm in this, I'm in this Um, and, um, so, you know, you could have a human go through all 43,000 concepts andVibhu Sapra [00:26:34]: And I've got to ask the basic question, you know, can we get examples where it hallucinates, pass it through, see what feature activates for hallucinations? Can I just, you know, turn hallucination down?Myra Deng [00:26:51]: Oh, wow. You really predicted a project we're already working on right now, which is detecting hallucinations using interpretability techniques. And this is interesting because hallucinations is something that's very hard to detect. And it's like a kind of a hairy problem and something that black box methods really struggle with. Whereas like Gen Z, you could always train a simple classifier to detect that hallucinations is harder. But we've seen that models internally have some... Awareness of like uncertainty or some sort of like user pleasing behavior that leads to hallucinatory behavior. And so, yeah, we have a project that's trying to detect that accurately. And then also working on mitigating the hallucinatory behavior in the model itself as well.Shawn Wang [00:27:39]: Yeah, I would say most people are still at the level of like, oh, I would just turn temperature to zero and that turns off hallucination. And I'm like, well, that's a fundamental misunderstanding of how this works. Yeah.Mark Bissell [00:27:51]: Although, so part of what I like about that question is you, there are SAE based approaches that might like help you get at that. But oftentimes the beauty of SAEs and like we said, the curse is that they're unsupervised. So when you have a behavior that you deliberately would like to remove, and that's more of like a supervised task, often it is better to use something like probes and specifically target the thing that you're interested in reducing as opposed to sort of like hoping that when you fragment the latent space, one of the vectors that pops out.Vibhu Sapra [00:28:20]: And as much as we're training an autoencoder to be sparse, we're not like for sure certain that, you know, we will get something that just correlates to hallucination. You'll probably split that up into 20 other things and who knows what they'll be.Mark Bissell [00:28:36]: Of course. Right. Yeah. So there's no sort of problems with like feature splitting and feature absorption. And then there's the off target effects, right? Ideally, you would want to be very precise where if you reduce the hallucination feature, suddenly maybe your model can't write. Creatively anymore. And maybe you don't like that, but you want to still stop it from hallucinating facts and figures.Shawn Wang [00:28:55]: Good. So Vibhu has a paper to recommend there that we'll put in the show notes. But yeah, I mean, I guess just because your demo is done, any any other things that you want to highlight or any other interesting features you want to show?Mark Bissell [00:29:07]: I don't think so. Yeah. Like I said, this is a pretty small snippet. I think the main sort of point here that I think is exciting is that there's not a whole lot of inter being applied to models quite at this scale. You know, Anthropic certainly has some some. Research and yeah, other other teams as well. But it's it's nice to see these techniques, you know, being put into practice. I think not that long ago, the idea of real time steering of a trillion parameter model would have sounded.Shawn Wang [00:29:33]: Yeah. The fact that it's real time, like you started the thing and then you edited the steering vector.Vibhu Sapra [00:29:38]: I think it's it's an interesting one TBD of what the actual like production use case would be on that, like the real time editing. It's like that's the fun part of the demo, right? You can kind of see how this could be served behind an API, right? Like, yes, you're you only have so many knobs and you can just tweak it a bit more. And I don't know how it plays in. Like people haven't done that much with like, how does this work with or without prompting? Right. How does this work with fine tuning? Like, there's a whole hype of continual learning, right? So there's just so much to see. Like, is this another parameter? Like, is it like parameter? We just kind of leave it as a default. We don't use it. So I don't know. Maybe someone here wants to put out a guide on like how to use this with prompting when to do what?Mark Bissell [00:30:18]: Oh, well, I have a paper recommendation. I think you would love from Act Deep on our team, who is an amazing researcher, just can't say enough amazing things about Act Deep. But he actually has a paper that as well as some others from the team and elsewhere that go into the essentially equivalence of activation steering and in context learning and how those are from a he thinks of everything in a cognitive neuroscience Bayesian framework, but basically how you can precisely show how. Prompting in context, learning and steering exhibit similar behaviors and even like get quantitative about the like magnitude of steering you would need to do to induce a certain amount of behavior similar to certain prompting, even for things like jailbreaks and stuff. It's a really cool paper. Are you saying steering is less powerful than prompting? More like you can almost write a formula that tells you how to convert between the two of them.Myra Deng [00:31:20]: And so like formally equivalent actually in the in the limit. Right.Mark Bissell [00:31:24]: So like one case study of this is for jailbreaks there. I don't know. Have you seen the stuff where you can do like many shot jailbreaking? You like flood the context with examples of the behavior. And the topic put out that paper.Shawn Wang [00:31:38]: A lot of people were like, yeah, we've been doing this, guys.Mark Bissell [00:31:40]: Like, yeah, what's in this in context learning and activation steering equivalence paper is you can like predict the number. Number of examples that you will need to put in there in order to jailbreak the model. That's cool. By doing steering experiments and using this sort of like equivalence mapping. That's cool. That's really cool. It's very neat. Yeah.Shawn Wang [00:32:02]: I was going to say, like, you know, I can like back rationalize that this makes sense because, you know, what context is, is basically just, you know, it updates the KV cache kind of and like and then every next token inference is still like, you know, the sheer sum of everything all the way. It's plus all the context. It's up to date. And you could, I guess, theoretically steer that with you probably replace that with your steering. The only problem is steering typically is on one layer, maybe three layers like like you did. So it's like not exactly equivalent.Mark Bissell [00:32:33]: Right, right. There's sort of you need to get precise about, yeah, like how you sort of define steering and like what how you're modeling the setup. But yeah, I've got the paper pulled up here. Belief dynamics reveal the dual nature. Yeah. The title is Belief Dynamics Reveal the Dual Nature of Incompetence. And it's an exhibition of the practical context learning and activation steering. So Eric Bigelow, Dan Urgraft on the who are doing fellowships at Goodfire, Ekt Deep's the final author there.Myra Deng [00:32:59]: I think actually to your question of like, what is the production use case of steering? I think maybe if you just think like one level beyond steering as it is today. Like imagine if you could adapt your model to be, you know, an expert legal reasoner. Like in almost real time, like very quickly. efficiently using human feedback or using like your semantic understanding of what the model knows and where it knows that behavior. I think that while it's not clear what the product is at the end of the day, it's clearly very valuable. Thinking about like what's the next interface for model customization and adaptation is a really interesting problem for us. Like we have heard a lot of people actually interested in fine-tuning an RL for open weight models in production. And so people are using things like Tinker or kind of like open source libraries to do that, but it's still very difficult to get models fine-tuned and RL'd for exactly what you want them to do unless you're an expert at model training. And so that's like something we'reShawn Wang [00:34:06]: looking into. Yeah. I never thought so. Tinker from Thinking Machines famously uses rank one LoRa. Is that basically the same as steering? Like, you know, what's the comparison there?Mark Bissell [00:34:19]: Well, so in that case, you are still applying updates to the parameters, right?Shawn Wang [00:34:25]: Yeah. You're not touching a base model. You're touching an adapter. It's kind of, yeah.Mark Bissell [00:34:30]: Right. But I guess it still is like more in parameter space then. I guess it's maybe like, are you modifying the pipes or are you modifying the water flowing through the pipes to get what you're after? Yeah. Just maybe one way.Mark Bissell [00:34:44]: I like that analogy. That's my mental map of it at least, but it gets at this idea of model design and intentional design, which is something that we're, that we're very focused on. And just the fact that like, I hope that we look back at how we're currently training models and post-training models and just think what a primitive way of doing that right now. Like there's no intentionalityShawn Wang [00:35:06]: really in... It's just data, right? The only thing in control is what data we feed in.Mark Bissell [00:35:11]: So, so Dan from Goodfire likes to use this analogy of, you know, he has a couple of young kids and he talks about like, what if I could only teach my kids how to be good people by giving them cookies or like, you know, giving them a slap on the wrist if they do something wrong, like not telling them why it was wrong or like what they should have done differently or something like that. Just figure it out. Right. Exactly. So that's RL. Yeah. Right. And, and, you know, it's sample inefficient. There's, you know, what do they say? It's like slurping feedback. It's like, slurping supervision. Right. And so you'd like to get to the point where you can have experts giving feedback to their models that are, uh, internalized and, and, you know, steering is an inference time way of sort of getting that idea. But ideally you're moving to a world whereVibhu Sapra [00:36:04]: it is much more intentional design in perpetuity for these models. Okay. This is one of the questions we asked Emmanuel from Anthropic on the podcast a few months ago. Basically the question, was you're at a research lab that does model training, foundation models, and you're on an interp team. How does it tie back? Right? Like, does this, do ideas come from the pre-training team? Do they go back? Um, you know, so for those interested, you can, you can watch that. There wasn't too much of a connect there, but it's still something, you know, it's something they want toMark Bissell [00:36:33]: push for down the line. It can be useful for all of the above. Like there are certainly post-hocVibhu Sapra [00:36:39]: use cases where it doesn't need to touch that. I think the other thing a lot of people forget is this stuff isn't too computationally expensive, right? Like I would say, if you're interested in getting into research, MechInterp is one of the most approachable fields, right? A lot of this train an essay, train a probe, this stuff, like the budget for this one, there's already a lot done. There's a lot of open source work. You guys have done some too. Um, you know,Shawn Wang [00:37:04]: There's like notebooks from the Gemini team for Neil Nanda or like, this is how you do it. Just step through the notebook.Vibhu Sapra [00:37:09]: Even if you're like, not even technical with any of this, you can still make like progress. There, you can look at different activations, but, uh, if you do want to get into training, you know, training this stuff, correct me if I'm wrong is like in the thousands of dollars, not even like, it's not that high scale. And then same with like, you know, applying it, doing it for post-training or all this stuff is fairly cheap in scale of, okay. I want to get into like model training. I don't have compute for like, you know, pre-training stuff. So it's, it's a very nice field to get into. And also there's a lot of like open questions, right? Um, some of them have to go with, okay, I want a product. I want to solve this. Like there's also just a lot of open-ended stuff that people could work on. That's interesting. Right. I don't know if you guys have any calls for like, what's open questions, what's open work that you either open collaboration with, or like, you'd just like to see solved or just, you know, for people listening that want to get into McInturk because people always talk about it. What are, what are the things they should check out? Start, of course, you know, join you guys as well. I'm sure you're hiring.Myra Deng [00:38:09]: There's a paper, I think from, was it Lee, uh, Sharky? It's open problems and, uh, it's, it's a bit of interpretability, which I recommend everyone who's interested in the field. Read. I'm just like a really comprehensive overview of what are the things that experts in the field think are the most important problems to be solved. I also think to your point, it's been really, really inspiring to see, I think a lot of young people getting interested in interpretability, actually not just young people also like scientists to have been, you know, experts in physics for many years and in biology or things like this, um, transitioning into interp, because the barrier of, of what's now interp. So it's really cool to see a number to entry is, you know, in some ways low and there's a lot of information out there and ways to get started. There's this anecdote of like professors at universities saying that all of a sudden every incoming PhD student wants to study interpretability, which was not the case a few years ago. So it just goes to show how, I guess, like exciting the field is, how fast it's moving, how quick it is to get started and things like that.Mark Bissell [00:39:10]: And also just a very welcoming community. You know, there's an open source McInturk Slack channel. There are people are always posting questions and just folks in the space are always responsive if you ask things on various forums and stuff. But yeah, the open paper, open problems paper is a really good one.Myra Deng [00:39:28]: For other people who want to get started, I think, you know, MATS is a great program. What's the acronym for? Machine Learning and Alignment Theory Scholars? It's like the...Vibhu Sapra [00:39:40]: Normally summer internship style.Myra Deng [00:39:42]: Yeah, but they've been doing it year round now. And actually a lot of our full-time staff have come through that program or gone through that program. And it's great for anyone who is transitioning into interpretability. There's a couple other fellows programs. We do one as well as Anthropic. And so those are great places to get started if anyone is interested.Mark Bissell [00:40:03]: Also, I think been seen as a research field for a very long time. But I think engineering... I think engineers are sorely wanted for interpretability as well, especially at Goodfire, but elsewhere, as it does scale up.Shawn Wang [00:40:18]: I should mention that Lee actually works with you guys, right? And in the London office and I'm adding our first ever McInturk track at AI Europe because I see this industry applications now emerging. And I'm pretty excited to, you know, help push that along. Yeah, I was looking forward to that. It'll effectively be the first industry McInturk conference. Yeah. I'm so glad you added that. You know, it's still a little bit of a bet. It's not that widespread, but I can definitely see this is the time to really get into it. We want to be early on things.Mark Bissell [00:40:51]: For sure. And I think the field understands this, right? So at ICML, I think the title of the McInturk workshop this year was actionable interpretability. And there was a lot of discussion around bringing it to various domains. Everyone's adding pragmatic, actionable, whatever.Shawn Wang [00:41:10]: It's like, okay, well, we weren't actionable before, I guess. I don't know.Vibhu Sapra [00:41:13]: And I mean, like, just, you know, being in Europe, you see the Interp room. One, like old school conferences, like, I think they had a very tiny room till they got lucky and they got it doubled. But there's definitely a lot of interest, a lot of niche research. So you see a lot of research coming out of universities, students. We covered the paper last week. It's like two unknown authors, not many citations. But, you know, you can make a lot of meaningful work there. Yeah. Yeah. Yeah.Shawn Wang [00:41:39]: Yeah. I think people haven't really mentioned this yet. It's just Interp for code. I think it's like an abnormally important field. We haven't mentioned this yet. The conspiracy theory last two years ago was when the first SAE work came out of Anthropic was they would do like, oh, we just used SAEs to turn the bad code vector down and then turn up the good code. And I think like, isn't that the dream? Like, you know, like, but basically, I guess maybe, why is it funny? Like, it's... If it was realistic, it would not be funny. It would be like, no, actually, we should do this. But it's funny because we know there's like, we feel there's some limitations to what steering can do. And I think a lot of the public image of steering is like the Gen Z stuff. Like, oh, you can make it really love the Golden Gate Bridge, or you can make it speak like Gen Z. To like be a legal reasoner seems like a huge stretch. Yeah. And I don't know if that will get there this way. Yeah.Myra Deng [00:42:36]: I think, um, I will say we are announcing. Something very soon that I will not speak too much about. Um, but I think, yeah, this is like what we've run into again and again is like, we, we don't want to be in the world where steering is only useful for like stylistic things. That's definitely not, not what we're aiming for. But I think the types of interventions that you need to do to get to things like legal reasoning, um, are much more sophisticated and require breakthroughs in, in learning algorithms. And that's, um...Shawn Wang [00:43:07]: And is this an emergent property of scale as well?Myra Deng [00:43:10]: I think so. Yeah. I mean, I think scale definitely helps. I think scale allows you to learn a lot of information and, and reduce noise across, you know, large amounts of data. But I also think we think that there's ways to do things much more effectively, um, even, even at scale. So like actually learning exactly what you want from the data and not learning things that you do that you don't want exhibited in the data. So we're not like anti-scale, but we are also realizing that scale is not going to get us anywhere. It's not going to get us to the type of AI development that we want to be at in, in the future as these models get more powerful and get deployed in all these sorts of like mission critical contexts. Current life cycle of training and deploying and evaluations is, is to us like deeply broken and has opportunities to, to improve. So, um, more to come on that very, very soon.Mark Bissell [00:44:02]: And I think that that's a use basically, or maybe just like a proof point that these concepts do exist. Like if you can manipulate them in the precise best way, you can get the ideal combination of them that you desire. And steering is maybe the most coarse grained sort of peek at what that looks like. But I think it's evocative of what you could do if you had total surgical control over every concept, every parameter. Yeah, exactly.Myra Deng [00:44:30]: There were like bad code features. I've got it pulled up.Vibhu Sapra [00:44:33]: Yeah. Just coincidentally, as you guys are talking.Shawn Wang [00:44:35]: This is like, this is exactly.Vibhu Sapra [00:44:38]: There's like specifically a code error feature that activates and they show, you know, it's not, it's not typo detection. It's like, it's, it's typos in code. It's not typical typos. And, you know, you can, you can see it clearly activates where there's something wrong in code. And they have like malicious code, code error. They have a whole bunch of sub, you know, sub broken down little grain features. Yeah.Shawn Wang [00:45:02]: Yeah. So, so the, the rough intuition for me, the, why I talked about post-training was that, well, you just, you know, have a few different rollouts with all these things turned off and on and whatever. And then, you know, you can, that's, that's synthetic data you can kind of post-train on. Yeah.Vibhu Sapra [00:45:13]: And I think we make it sound easier than it is just saying, you know, they do the real hard work.Myra Deng [00:45:19]: I mean, you guys, you guys have the right idea. Exactly. Yeah. We replicated a lot of these features in, in our Lama models as well. I remember there was like.Vibhu Sapra [00:45:26]: And I think a lot of this stuff is open, right? Like, yeah, you guys opened yours. DeepMind has opened a lot of essays on Gemma. Even Anthropic has opened a lot of this. There's, there's a lot of resources that, you know, we can probably share of people that want to get involved.Shawn Wang [00:45:41]: Yeah. And special shout out to like Neuronpedia as well. Yes. Like, yeah, amazing piece of work to visualize those things.Myra Deng [00:45:49]: Yeah, exactly.Shawn Wang [00:45:50]: I guess I wanted to pivot a little bit on, onto the healthcare side, because I think that's a big use case for you guys. We haven't really talked about it yet. This is a bit of a crossover for me because we are, we are, we do have a separate science pod that we're starting up for AI, for AI for science, just because like, it's such a huge investment category and also I'm like less qualified to do it, but we actually have bio PhDs to cover that, which is great, but I need to just kind of recover, recap your work, maybe on the evil two stuff, but then, and then building forward.Mark Bissell [00:46:17]: Yeah, for sure. And maybe to frame up the conversation, I think another kind of interesting just lens on interpretability in general is a lot of the techniques that were described. are ways to solve the AI human interface problem. And it's sort of like bidirectional communication is the goal there. So what we've been talking about with intentional design of models and, you know, steering, but also more advanced techniques is having humans impart our desires and control into models and over models. And the reverse is also very interesting, especially as you get to superhuman models, whether that's narrow superintelligence, like these scientific models that work on genomics, data, medical imaging, things like that. But down the line, you know, superintelligence of other forms as well. What knowledge can the AIs teach us as sort of that, that the other direction in that? And so some of our life science work to date has been getting at exactly that question, which is, well, some of it does look like debugging these various life sciences models, understanding if they're actually performing well, on tasks, or if they're picking up on spurious correlations, for instance, genomics models, you would like to know whether they are sort of focusing on the biologically relevant things that you care about, or if it's using some simpler correlate, like the ancestry of the person that it's looking at. But then also in the instances where they are superhuman, and maybe they are understanding elements of the human genome that we don't have names for or specific, you know, yeah, discoveries that they've made that that we don't know about, that's, that's a big goal. And so we're already seeing that, right, we are partnered with organizations like Mayo Clinic, leading research health system in the United States, our Institute, as well as a startup called Prima Menta, which focuses on neurodegenerative disease. And in our partnership with them, we've used foundation models, they've been training and applied our interpretability techniques to find novel biomarkers for Alzheimer's disease. So I think this is just the tip of the iceberg. But it's, that's like a flavor of some of the things that we're working on.Shawn Wang [00:48:36]: Yeah, I think that's really fantastic. Obviously, we did the Chad Zuckerberg pod last year as well. And like, there's a plethora of these models coming out, because there's so much potential and research. And it's like, very interesting how it's basically the same as language models, but just with a different underlying data set. But it's like, it's the same exact techniques. Like, there's no change, basically.Mark Bissell [00:48:59]: Yeah. Well, and even in like other domains, right? Like, you know, robotics, I know, like a lot of the companies just use Gemma as like the like backbone, and then they like make it into a VLA that like takes these actions. It's, it's, it's transformers all the way down. So yeah.Vibhu Sapra [00:49:15]: Like we have Med Gemma now, right? Like this week, even there was Med Gemma 1.5. And they're training it on this stuff, like 3d scans, medical domain knowledge, and all that stuff, too. So there's a push from both sides. But I think the thing that, you know, one of the things about McInturpp is like, you're a little bit more cautious in some domains, right? So healthcare, mainly being one, like guardrails, understanding, you know, we're more risk adverse to something going wrong there. So even just from a basic understanding, like, if we're trusting these systems to make claims, we want to know why and what's going on.Myra Deng [00:49:51]: Yeah, I think there's totally a kind of like deployment bottleneck to actually using. foundation models for real patient usage or things like that. Like, say you're using a model for rare disease prediction, you probably want some explanation as to why your model predicted a certain outcome, and an interpretable explanation at that. So that's definitely a use case. But I also think like, being able to extract scientific information that no human knows to accelerate drug discovery and disease treatment and things like that actually is a really, really big unlock for science, like scientific discovery. And you've seen a lot of startups, like say that they're going to accelerate scientific discovery. And I feel like we actually are doing that through our interp techniques. And kind of like, almost by accident, like, I think we got reached out to very, very early on from these healthcare institutions. And none of us had healthcare.Shawn Wang [00:50:49]: How did they even hear of you? A podcast.Myra Deng [00:50:51]: Oh, okay. Yeah, podcast.Vibhu Sapra [00:50:53]: Okay, well, now's that time, you know.Myra Deng [00:50:55]: Everyone can call us.Shawn Wang [00:50:56]: Podcasts are the most important thing. Everyone should listen to podcasts.Myra Deng [00:50:59]: Yeah, they reached out. They were like, you know, we have these really smart models that we've trained, and we want to know what they're doing. And we were like, really early that time, like three months old, and it was a few of us. And we were like, oh, my God, we've never used these models. Let's figure it out. But it's also like, great proof that interp techniques scale pretty well across domains. We didn't really have to learn too much about.Shawn Wang [00:51:21]: Interp is a machine learning technique, machine learning skills everywhere, right? Yeah. And it's obviously, it's just like a general insight. Yeah. Probably to finance too, I think, which would be fun for our history. I don't know if you have anything to say there.Mark Bissell [00:51:34]: Yeah, well, just across the science. Like, we've also done work on material science. Yeah, it really runs the gamut.Vibhu Sapra [00:51:40]: Yeah. Awesome. And, you know, for those that should reach out, like, you're obviously experts in this, but like, is there a call out for people that you're looking to partner with, design partners, people to use your stuff outside of just, you know, the general developer that wants to. Plug and play steering stuff, like on the research side more so, like, are there ideal design partners, customers, stuff like that?Myra Deng [00:52:03]: Yeah, I can talk about maybe non-life sciences, and then I'm curious to hear from you on the life sciences side. But we're looking for design partners across many domains, language, anyone who's customizing language models or trying to push the frontier of code or reasoning models is really interesting to us. And then also interested in the frontier of modeling. There's a lot of models that work in, like, pixel space, as we call it. So if you're doing world models, video models, even robotics, where there's not a very clean natural language interface to interact with, I think we think that Interp can really help and are looking for a few partners in that space.Shawn Wang [00:52:43]: Just because you mentioned the keyword
🧭 REBEL Rundown 📌 Key Points 💨 HFNC met criteria for non-inferiority to BPAP for preventing intubation or death within 7 days in four of the five ARF subgroups.🧪 Bayesian dynamic borrowing increased power across subgroups but created variable certainty, especially in smaller groups such as COPD.🫁 The immunocompromised hypoxemia subgroup did not meet non-inferiority, leading to early trial stopping for futility.️ Rescue BPAP use, subgroup-specific exclusion criteria, and non-standardized BPAP delivery are important contextual factors that influence how subgroup results should be interpreted. Click here for Direct Download of the Podcast. 📝 Introduction Bilevel Positive Airway Pressure (BPAP) has long been a foundational modality in the management of acute respiratory failure (ARF), particularly in COPD exacerbations and cardiogenic pulmonary edema, where it can rapidly reduce work of breathing and improve gas exchange. It remains a core tool in our respiratory support arsenal.High-flow nasal cannula (HFNC), however, has expanded what we can offer patients by delivering many of the same physiologic benefits through a far more comfortable interface. With high flows, modest PEEP, and effective dead-space washout, HFNC can improve oxygenation and decrease work of breathing while preserving the ability to talk, cough, eat, and interact with staff and family. This combination of physiologic support and tolerability makes HFNC especially attractive in patients where comfort, anxiety, or cardiovascular stability are key considerations, and in settings where prolonged noninvasive support may be needed. Rather than competing with BPAP, HFNC broadens our options in ARF and allows us to better match the modality to the patient and their underlying disease process.The RENOVATE trial set out to answer a high-impact question across five distinct etiologic groups: Is HFNC non-inferior to BPAP (NIV) for preventing intubation or death in acute respiratory failure? 🧾 Paper Azoulay É, et al. High-Flow Nasal Oxygen vs Noninvasive Ventilation in Patients With Acute Respiratory Failure: The RENOVATE Randomized Clinical Trial. JAMA. 2025 PMID: 39657981 🔙Previously Covered On REBEL: HFNC: Part 1 – How It WorksHFNC: Part 2 – Adult and Pediatric IndicationsFLORALI and AVOID TrialFLORALI-2: NIV vs HFNC as Pre-Oxygenation Prior to IntubationThe Pre-AeRATE Trial – HFNC vs NC for RSI ️ What They Did CLINICAL QUESTION Is HFNC non-inferior to BPAP for rate of endotracheal intubation or death at 7 days in patients with acute respiratory failure due to a variety of causes? STUDY DESIGN Multicenter, randomized non-inferiority trial33 Brazilian hospitalsNov 2019 – Nov 2023Adaptive Bayesian hierarchical modeling with dynamic borrowingOpen label, outcome adjudicators blindedPatients were classified into 5 subgroups SUBGROUPS 1. Non-immunocompromised hypoxemiaSpO₂ < 90% on room air orPaO₂ < 60 mm Hg on room air plusIncreased respiratory effort (accessory muscle use, paradoxical breathing, thoracoabdominal asynchrony) orRespiratory rate > 25 breaths/min2. Immunocompromised hypoxemiaDefined as:Use of immunosuppressive drugs for >3 monthsOR high-dose steroids >0.5 mg/kg/dayOR solid organ transplantOR solid tumors or hematologic malignancies (past 5 years)OR HIV with AIDS / primary immunodeficiency3. COPD exacerbation with acidosisHigh clinical suspicion of COPD as primary diagnosisRR >25 with accessory muscle use, paradoxical breathing, and/or thoracoabdominal asynchronyABG: pH 454. Acute cardiogenic pulmonary edema (ACPE)Sudden onset dyspnea and rales± S3 heart soundNo evidence of aspiration, infection, or pulmonary fibrosisCXR consistent with pulmonary edema5. Hypoxemic COVID-19 (added June 2023)Added due to deviations between expected and observed outcome proportionsAny patient across the other 4 groups with PCR-confirmed SARS-CoV-2 infection in any of the above groups POPULATION Inclusion Criteria:≥18 yrs with ARF* in one of 5 pre-defined subgroups excluding COPD was defined by the following:Hypoxemia with SpO₂
Dave Rubin of "The Rubin Report" talks to Michael Shermer author of "Truth: What It Is, How to Find It, and Why It Still Matters" about his new book on truth, science, and skepticism; the erosion of trust in institutions after COVID; the politicization of science; how to think critically in an age of misinformation, social media, and information overload; how you can sort out truth from lies using evidence-based Bayesian reasoning; the relationship between science, religion, and meaning; the rise of conspiracy theory pushers like Candace Owens and Tucker Carlson; his concerns over AI, deepfakes, and conspiracy theories; and his take on the most recent revelations with UFOs, UAPs, and historical revisionism, and much more.
• Support & get perks!• Proudly sponsored by PyMC Labs! Get in touch at alex.andorra@pymc-labs.com• Intro to Bayes and Advanced Regression courses (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work !Chapters:00:00 Scaling Bayesian Neural Networks04:26 Origin Stories of the Researchers09:46 Research Themes in Bayesian Neural Networks12:05 Making Bayesian Neural Networks Fast16:19 Microcanonical Langevin Sampler Explained22:57 Bottlenecks in Scaling Bayesian Neural Networks29:09 Practical Tools for Bayesian Neural Networks36:48 Trade-offs in Computational Efficiency and Posterior Fidelity40:13 Exploring High Dimensional Gaussians43:03 Practical Applications of Bayesian Deep Ensembles45:20 Comparing Bayesian Neural Networks with Standard Approaches50:03 Identifying Real-World Applications for Bayesian Methods57:44 Future of Bayesian Deep Learning at Scale01:05:56 The Evolution of Bayesian Inference Packages01:10:39 Vision for the Future of Bayesian StatisticsThank you to my Patrons for making this episode possible!Come meet Alex at the Field of Play Conference in Manchester, UK, March 27, 2026!Links from the show:David Rügamer:* Website* Google Scholar* GitHubEmanuel Sommer:* Website* GitHub* Google ScholarJakob Robnik:* Google Scholar* GitHub* Microcanonical Langevin paper* LinkedIn
Conversion Monthly - The panel kicks off 2026 with predictions on AI-driven creative workflows, agentic shopping behaviours, and the tools reshaping Amazon seller operations. Host: Danny McMillan Panel: Sim Mahon, Dorian Gorski, Matt Kostan Episode Summary The newly rebranded Conversion Monthly show returns with its expert panel to discuss 2026 predictions for Amazon creative optimisation. The conversation covers how AI workflows have evolved since early 2025, with Dorian noting how N8N has become significantly more accessible through built-in AI assistants. Sim shares that his team can now create final, upload-ready main images in a single AI generation. The panel discusses agentic shopping and how AI-driven product discovery may fundamentally change conversion optimisation. Matt highlights the trend toward hyper-specific product positioning, where sellers create separate ASINs for the same product targeting different demographics. Danny introduces Claude's new Co-Work feature as a significant leap that removes technical barriers for sellers wanting to build automations. The panel agrees that "human in the loop" will be the defining phrase of 2026. Sim reveals his investment in 51 Folds, a prediction platform using Bayesian networks. Key Takeaways One-shot main images are now reality - AI image generation has reached the point where final, upload-ready Amazon images can be created in a single prompt Hyper-specific product positioning is trending - creating separate ASINs for the same product targeting different demographics aligns with AI recommendations Technical barriers to automation are evaporating - tools like Claude Co-Work and improved N8N AI assistants are making workflow automation accessible "Human in the loop" defines 2026 - the winning strategy combines automated data collection with human strategic oversight The big three AI providers have stabilised - Anthropic, Google, and OpenAI now dominate, reducing shiny object syndrome Video generation remains the next frontier - while image generation is solved, video still requires scene-by-scene refinement Chapter Markers 00:00 - Introduction and 2026 Outlook 00:58 - Dorian on the Pace of Change Since 2025 04:07 - N8N Accessibility and Self-Build Workflows 05:33 - One-Shot Image Generation Capabilities 07:23 - Video Generation Limitations 10:26 - Business Systems, ClickUp and Future-Proofing 14:37 - Hyper-Specific Product Positioning 20:06 - Keplo 2026 Direction 22:26 - Competitive Advantage and AI Accessibility 25:01 - The Big Three AI Providers 28:46 - 51 Folds Investment and Bayesian Prediction 33:14 - Panel 2026 Priorities 38:12 - Wrap-Up Resources Seller Sessions Website Seller Sessions YouTube Sim Mahon on LinkedIn Dorian Gorski on LinkedIn Matt Kostan on LinkedIn