POPULARITY
Early bird discounts for the San Francisco World's Fair, the biggest AIE gathering of the year, end today - prices will go up by ~$500 tonight so do please lock in ASAP!From near-universal AI tool adoption inside Shopify to internal systems for ML experimentation, auto-research, customer simulation, and ultra-low-latency search, Mikhail Parakhin joins us for a deep dive into what it actually looks like when a 20-year-old, $200B software company goes all-in on AI. We cover why Shopify has become much more vocal about its internal stack, what changed after the December model-quality inflection, and why the real bottleneck in AI coding is no longer generation, but review, CI/CD, and deployment stability.We also go inside Tangle, Tangent, SimGym, which are three major AI initiatives that Shopify is doing to make experimentation reproducible, optimization automatic, customer behavior simulatable, and search and catalog intelligence faster and cheaper at scale. Along the way, Mikhail explains UCP, Liquid AI, and why token budgets are directionally right but often measured badly, why AI-written code can still increase bugs in production, what makes Shopify's customer simulation defensible, and what he learned from the Sydney era at Bing.We discuss:* Mikhail's path from running a major Microsoft business unit spanning Windows, Edge, Bing, and ads to becoming CTO of Shopify* Why Shopify is talking more publicly about AI now, and why staying at the frontier has become necessary for the company* Shopify's internal AI adoption curve, the December inflection, and why CLI-style tools are rising faster than traditional IDE-based tools* Why Jensen Huang is directionally right on token budgets, but raw token count is still the wrong way to evaluate engineering output* Why the real unlock is not more agents in parallel, but better critique loops, stronger models, and spending more on review than generation* Why AI coding can still lead to more bugs in production even if models write cleaner code on average than humans* Why Shopify built its own PR review flow, and why Mikhail thinks most off-the-shelf review tools miss the point* How PR volume, test failures, and deployment rollback are becoming the real bottlenecks in the agent era* Why Git, pull requests, and CI/CD may need a new metaphor once code is written at machine speed* What Tangle is, and how Shopify uses it to make ML and data workflows reproducible, collaborative, and production-ready from the start* Why Tangle is different from Airflow, and why content-addressed caching creates network effects across teams* What Tangent is, and how Shopify is using auto-research loops to optimize search, themes, prompt compression, storage, and more* Why Tangent is becoming a democratizing tool for PMs and domain experts, not just ML engineers* Why AutoML finally feels real in the LLM era, and where auto-research still falls short today* Why Tangle, Tangent, and SimGym become much more powerful when combined into one system* What SimGym is, why simulated customers only work if you have real historical behavior, and why Shopify's data gives it a moat* How SimGym evolved from comparing A/B variants to telling merchants what to change on a single live storefront to raise conversions* Why customer simulation is so expensive, from multimodal models to browser farms to serving and distillation costs* How Shopify models merchant and buyer trajectories, runs counterfactuals, and thinks about interventions like discounts, campaigns, and notifications* Why category-level behavior is so different across commerce, and why ideas like Chinese Restaurant Processes are showing up again in practice* Shopify's new UCP and catalog work, including runtime product search, bulk lookups, and identity linking* Why Shopify is using Liquid AI, and why Mikhail sees it as the first genuinely competitive non-transformer architecture he has used in practice* Where Liquid already works inside Shopify today, from low-latency query understanding to large-scale catalog and Sidekick Pulse workloads* Whether Liquid could become frontier-scale with enough compute, and why Shopify remains pragmatic and merit-based about model choice* Who Shopify is hiring right now across ML, data science, and distributed databases* The Sydney story at Bing, why its personality was not an accident, and what Mikhail learned from deliberately shaping AI character early onMikhail Parakhin* LinkedIn: https://www.linkedin.com/in/mikhail-parakhin/* X: https://x.com/MParakhinTimestamps00:00:00 Introduction: Mikhail Parakhin, Microsoft, and Shopify00:01:16 Why Shopify Is Talking More About AI00:02:29 Internal AI Adoption at Shopify and the December Inflection00:06:54 Token Budgets, Jensen Huang, and Why Usage Metrics Can Mislead00:10:55 Why Shopify Built Its Own AI PR Review System00:12:38 AI Coding, More Bugs, and the Real Deployment Bottleneck00:14:11 Why Git, PRs, and CI/CD May Need to Change for Agents00:18:24 Tangle: Shopify's Reproducible ML and Data Workflow Engine00:21:19 Why Tangle Is Different from Airflow00:26:14 Tangent: Auto Research for Optimization and Experimentation00:30:07 How Tangent Democratizes Experimentation Beyond ML Engineers00:33:06 The Limits of Auto Research00:36:36 Why Tangle, Tangent, and SimGym Compound Together00:37:20 SimGym: Simulating Customers with Shopify's Historical Data00:42:47 The Infra Behind SimGym00:46:00 Why SimGym Gets Better with Real Customer History00:47:30 Counterfactuals, HSTU, and Modeling Merchant Trajectories00:51:55 CRPs, Clustering, and Category-Level Customer Behavior00:53:30 UCP, Shopify Catalog, and Identity Linking00:55:07 Liquid AI: Why Shopify Uses Non-Transformer Models00:59:13 Real Shopify Use Cases for Liquid01:03:00 Can Liquid Scale into a Frontier Model?01:09:49 Hiring at Shopify: ML, Data Science, and Databases01:10:43 Sydney at Bing: Personality Shaping and AI Character01:13:32 Closing ThoughtsTranscript[00:00:00] swyx: Okay. We're here in the studio, a remote studio, with Mikhail Parakhin, CTO of Shopify. Welcome.[00:00:08] Mikhail Parakhin: Thank you. Welcome.[00:00:10] swyx: I don't even know if I should introduce you as CTO of Shopify. I feel like you have many identities. Uh, you led sort of the, the Bing ML team, I guess, uh, uh, or ads team. I, I don't know, I don't know, uh, you know, it's, uh, people va-variously refer you as like CEO or, or, uh, I don't know what that, that, that said previous role at Microsoft was.[00:00:29] Mikhail Parakhin: Uh, that was... Yeah, my previous role w- at Microsoft was the-- I actually was the CEO of one of Microsoft's business units, which included, as I, you know, as we discussed, all the things that people like to laugh about, uh, including Windows and Edge and Bing and ads and everything.[00:00:47] swyx: Yeah, yeah. What a, what a, what a wild time.You've obviously, uh, done a lot since you landed at Shopify. Uh, one of the reasons I reached out was because you started promoting more sort of internal tooling, uh, primarily Tangle, but also a lot of people have seen and adopted Tobi's QMD, uh, and obviously, I think, uh, Shopify has always been sort of leading in terms of, uh, engineering.I think more-- it's just more recent that you guys have been more vocal about your sort of AI adoption. Is that, is that true?[00:01:16] Mikhail Parakhin: Well, I think AI tools in general are fairly recent development, uh, and we've-- Shopify, you know, at this stage of its development, we're developing AI in-in-house and other, uh, building tools that use AI and, you know, interfacing with the wider AI community, uh, you know, are on the sort of the, uh, runaway trajectory.So it just did by sort of natural byproduct. We, we talk about it more also. We just, uh, just even yesterday, Andrej Karpathy was famous in tweeting about, oh, are there some, uh, ways, uh, that, that you can organize your agents to store the data and then, uh, look up the data so that you don't have to research or, or lose context every- Yestime. And a little bit tongue in cheek, I tweeted that, “Hey, we've, we've done it much earlier, and we even have different approaches, Tobi and I.” Tobi, of course, is a big fan of QMD, and I'm more of a SQL, SQLite fan. But, uh, yeah, very similar things that we've already done here. The point is, yeah, we're very dynamic, you know, explosively growing company, and we have to be at the forefront of AI adoption, obviously.[00:02:29] swyx: Yeah. Yeah. Um, you, your team kindly prepared some slides actually that we were gonna bring up on to, uh, the screen. I think I can, I can screen share, and then we can kind of go through some of the shocking stats that maybe, maybe put some numbers to what exactly is going on. So here we have, uh- An internal AI tool adoption chart.What are we looking at here? What ?[00:02:54] Mikhail Parakhin: Yeah, this is very interesting statistics. Uh, this is number of daily active workers, you know, think of, uh, DAO, basically the active users of-[00:03:05] swyx: Yeah ...[00:03:05] Mikhail Parakhin: AI tool as a percentage of all the people in the company, right? And then- Yeah ... different AI tools. And, uh, you could see two things here is that one is the green is total.Uh, green is just total. So you could see that it approaches really % by now. It's hard not to do your job now without interacting deeply, at least with one tool. You could see another interesting thing is just as many people commented in December was the phase transition when suddenly models gotten good enough that, that everything took off and started growing.Uh, it, it was many people noticed that the thing is that small improvements accumulated into this big change in Sep- December roughly timeframe.[00:03:52] swyx: Yeah.[00:03:52] Mikhail Parakhin: The other thing I would claim you could see is that, uh, CLI-based tools and tools that don't require you to look at the code becoming more popular, and you could see, yeah, various versions of, uh, Cloud Code and Codex and Pi and internal development tools taking off.Uh, exactly, yeah, uh, and blue is our River, just internal agent for coding, where tools, uh, that require IDEs such as, uh, GitHub, Copilot or Cursor, they're not exactly shrinking, but they're not growing as fast. Like, uh, red, red line is, is the IDE kind of tools. So you could see that they're, they're not experiencing as, as fast of a growth.[00:04:37] swyx: As I understand it, basically, every employee has their choice, right? Of choose whatever tool you use, and then you're just kind of doing a, a daily sur-survey or something.[00:04:47] Mikhail Parakhin: Exactly. And, uh, we- Yeah ... the, the push is to get your job done, you can use any tool, and we effectively fund unlimited tokens for everybody.Uh, we, we do, we do try to control the models that, uh, people use, but from the bottom, not from top. Like we basically say, “Hey, please don't use anything less than Opus four point six.”[00:05:09] swyx: Oh .[00:05:10] Mikhail Parakhin: Some people, some people end up using GPT five point four extra high. Some people use Opus four point six. Um, uh, you know, uh, there are some, uh, there are plus and minuses in going for full one million context window versus not.But, uh, we try to discourage people from using anything less than that.[00:05:28] swyx: Yeah, yeah. Got it, got it. Uh, I mean, uh, that's, you know... The, the next chart here, it really kind of shows the expansion and the sort of December twenty twenty-five inflection, right? That, uh, people are using a lot of tokens. I think it's also really interesting that no one was kind of abusing it in twenty twenty-five.Like it was- Had comparatively, uh, to this year, there was almost no growth. I mean, it's still like, you know, probably, probably gave fifty percent.[00:05:56] Mikhail Parakhin: Yeah. This is just a different scale. It's still exponential- Yeah, yeah ...growth at just a different- ...rate of expansion. Uh, there was inflection point, and Sean, I would claim the, the super interesting part here is that you could see that the distribution becoming more and more skewed.Yes. The top percentiles grow faster. So that means- Yeah ...the people in the top ten percentile, they, their consumption grows faster than seventy-five and so forth. So, uh, the distribution skews more and more towards the highest users, which is... I don't know what it tells me. It's like it feels not ideal, to be honest.Or maybe it's okay. We'll see.[00:06:36] swyx: Why does it feel not ideal? Is, is it because of, um, quantity over quality, or what's the concern?[00:06:42] Mikhail Parakhin: Because take it to the limit. That means, you know, if, if this rate of separation continued- Ah, yes ...a year, there will be one person consuming all the tokens. So it's just, it's kinda strange.[00:06:54] swyx: Yeah, I mean, um, uh, I, I think internal like teaching and all that, uh, will, will help sort of distribute things more widely. But in, in the early days, of course, the people who are sort of more AI-pilled will obviously find more ways to use it than the people who are less AI-pilled. Maybe let's, let's call it that.I'll just, I'll just kinda quickly, uh, pause from the, the... You know, we will go back to the rest of the slides, but I just wanna, um, review, you know, there are a lot of CTOs of, of large companies like yourself where they're all considering some kind of token budget, right? Like I think it's something, something that Jensen Huang has been talking about, where like if your 200K engineer is not using 100K of tokens every year, like they're, they're underutilizing coding agents.Of course, Jensen Huang would say that, but like it seems a very quantity over quality approach and like some, some people are basically saying like, well, is this comparable to judging engineer quality by lines of code, right? Which we also know is like kind of flawed, but better than nothing. So I, I don't know if you have like a sort of management take here on, on how to view this kind of, uh, metrics.[00:08:02] Mikhail Parakhin: Well, I mean, you're, you're baiting me. I, I like... This is my favorite topic. Uh, if you let me, I'll probably talk for two hours on just this. I have a lot of things to say. Like I do think Jensen gotten a lot of bad press saying, “Oh, of course you're, you know, this, uh, the- ...the cake seller says you don't need enough cakes.”You know? Like, of course. Uh, but, uh, I actually, uh, think that's undeserved. I think he, he's actually right. Uh, I do think- He,[00:08:33] swyx: he's directionally correct.[00:08:35] Mikhail Parakhin: Yeah. Yeah. He's directionally correct for sure. Uh-[00:08:37] swyx: Who knows what the right number is? Yeah.[00:08:39] Mikhail Parakhin: The thing that I do Uh, want to say, and this is something that we learned through trial and error and very important is like two things.One is that it's not about just consuming tokens. Uh, you can consume tokens and, and in fact, the anti-pattern is running multiple agents, too many agents in parallel that don't communicate with each other. That's almost useless, uh, compared to just fewer agents and burns tokens very efficiently. Uh, setting up the right critique loop, especially with the high quality models, where one agent does something, the other one, ideally with a different model, critiques it, uh, suggests ways to improve it, the agent redoes it with this critique and, and so it takes much longer.So people don't like it because latency goes up. You know, they, they have to wait until this debate is happening. But, uh, the quality of the code is much higher. And another thing, just since you mentioned like, look, uh, uh, yeah, the overall budget is just like, uh, lines of codes. Lines of codes are exploding for everybody right now, or partially because AI is really mover balls, but partially just because AI can write a lot more code, you know, doesn't get tired.And so you have to have to have a very strong narrow waist during PR review. Otherwise, just the number of bugs will go through the roof. It's, uh, it's this unexpected consequence of the just volume trumping everything. I would claim by now good model writes code on average with fewer bugs than, than the average human.But since they write so much more of it, like more of it will make it into production. So you have to- You still[00:10:26] swyx: have[00:10:26] Mikhail Parakhin: more bugs. Yeah. Have to have a very rigorous PR reviews, also automated of course. But, uh, yeah, that to spend a lot budget there. Like this, this for me, for me, actually, the important metric is the ratio of budget spent during code generation versus, uh, spent, uh, expensive tokens like GPT, uh, five point four Pro or, uh, uh, Deep Think from Gemini, you know, checking on PR reviews.[00:10:55] swyx: Yeah, totally. Uh, I noticed in your chart you didn't have any review tools. Do you just use like, like let's say a Claude code to review tools? Or do you have another set of review tools like the Greptiles, the Code Rabbits, uh, Devin Reviews has a review tool. I don't know if you've had those specialist review tools.[00:11:13] Mikhail Parakhin: You are a little bit jumping on my store tool right now because the graphs I was only showing public tools. Uh, uh, the-- I haven't found a good PR review tool that, that does what I think should be done. And, uh, partially my, my thinking is because it's so... It just goes against both what people feel like emotionally they prefer and, uh, some of the, uh, you know, frankly Even business models that, that the companies run.At peer review tool, uh, time, you want to run the largest models. That means, I don't know, Codex or, or, uh, Cloud Code is not gonna cut it. You need to have pro-level models if you really want to, uh, stand the tide of bots from going into production. And you need us to spend a lot of time, the models taking turns, but you don't want, like, a big swarm of, uh, of, uh, agents.So in fact, you end up in a different dual-dualistic world where you generate not that many tokens. You, in fact, generate few tokens, but it takes f-a long time because these are expensive models taking turns rather than many, many agents trying to do many things in parallel. So that's, that's why I feel like I haven't found good tools, so we are using our own for peer review for now.[00:12:33] swyx: Yeah. Yeah. I mean, uh, I think a lot of companies are building their own, uh, especially to their needs, right?[00:12:38] Mikhail Parakhin: Mm-hmm.[00:12:38] swyx: Um, I, uh, you also have a chart here going back to the slides on, uh, PR merge growth, where we're now at thirty percent, uh, month on month rather than ten percent. Uh, and also the, the estimated complexity is going up.You know, this is productivity, right? ‘Cause y- presumably there's more stuff going into the code base and more, more features getting worked on. I'm curious about the backlog, right? Like the, the, the-- I actually don't mind a pro-level model taking an hour or two hours to review my PR, because I've dealt with humans who take a week to review my PR, right?And I keep pinging them on Slack, “Hey, hey, review my PR.” So, you know, I think there's some trade-off here where, like, it still doesn't make sense.[00:13:18] Mikhail Parakhin: Exactly. That, that's exactly m-my point. Uh, that on one hand, you can tolerate longer latencies at, uh, PR. On the other hand, like right now, the real problem is not in spending time waiting for PR.It's real problem is since there's so much more code than- Yeah ... uh, probability of at least some tests failing going up, and then you, like, keep de-failing, then you have to find the offending PR, evict it, retest it without that PR, and so deployment cycle becomes much longer. Uh, so it actually, in terms of the overall time to deploy, it's total time savings if you spend more time on a longer model, like thinking for an hour, because then, then you, you don't have to spend all that time during testing and rolling, you know, rolling back the deployment.[00:14:03] swyx: Yeah, totally. That's still worth it. You know, you don't look at the individual, look at the aggregate, and look at the, the, the change in the aggregate system.[00:14:11] Mikhail Parakhin: Exactly.[00:14:11] swyx: I'm kind of curious if, like, there's this PR mentality and, like, c-- the, the, the CICD paradigm will be changed eventually. Some people are like, obviously a lot of people want new GitHub, but I even wonder if, like, Git is the problem, right?Like, is that the bottleneck? Is the concept of a PR a bottleneck? Do you guys use stack diffs? I don't know if, uh, that's a, like, a merge queue stack diff type of thing.[00:14:34] Mikhail Parakhin: We, we use, we use Stacks, we u- we use Graphite. We worked with, uh, Graphite a lot. Uh, so we use Stack, uh, PRs. I think, uh, like that's clearly the overall CICD in general, and the interaction with the code repository right now is the, clearly the sort of the, the main issue and the bottleneck for us, uh, and highest top of mind.I would say we probably need a different metaphor or different whole design of how to process it in new agentic world. I haven't seen anything dramatically better yet. I, I think everybody right now is just trying to keep their head above the water ‘cause, ‘cause there, there's so many PRs and then everybody's CICD pipelines start creaking, the, the times are increasing, the number of bugs slipping by increasing, and you have to, have to clap on down.And so we are a little bit in this situation when we need to first stabilize that story and then start thinking, hey, what, what it could be a completely different and new world, which I haven't... I know some people working on it. I haven't seen something, like anything super compelling yet, but clearly the old thing were designed for humans will need to be morphed into something new.[00:15:53] swyx: One of the thing that I, I think about is kind of like the merge conflict is basically a global mutex on the whole system, right? And in, in hu- in human organizations, we do have something like that. It's the company standup. But like, other than that, it's like it's actually fitting for us to be somewhat decentralized, somewhat plugged into one stream of information source, but somewhat lossy.Like it's okay, you know, that, that not every delivery is like atomic consistency. Like we're not dealing with a database sometimes.[00:16:27] Mikhail Parakhin: This is a very good point, uh, because since humans don't write code too fast, you know that global mutex is not too bad. Once you-[00:16:36] swyx: Yes ...[00:16:37] Mikhail Parakhin: start writing code at the speed of machine, it becomes the, you know, the bottleneck.Then what do you do? Maybe, and I can't believe I'm saying this because I, I'm long-- lifelong opponent of, uh, microservices, and I always thought that was, like, a really bad idea. And now that you're saying it, like, maybe in new guys like microservices will make a comeback, you know, because then you, you can ship things independently in tiny things and, and the managing all that complexity automatically will be much easier.I don't know. Like, we'll s-- we'll have to see.[00:17:10] swyx: Yeah. I mean, I don't know what the Microsoft or, or Shopify thing is, but I, I read this paper from Google where they have a monorepo that deploys into microservices, right? And then, uh, the other concept that I think about a lot is the Chaos Monkey concept from, from Netflix.Being able to create, like, this robust system where, um, uh, you know, you, you have the service discovery, you have the, uh, the independent, independent microservices discovery and, and, uh, you know, probably going to be a fair amount of duplication. That's how an organic system sort of scales, uh, that, that you have that...I don't know how you call it. Slack? Robustness? Depend-- uh, d-duplication. I, I, I forget the-- I, I'm-- And this-- those-- these are not exactly the terms- Hmm ... I'm looking for, but I c-can't really think of the words. Okay. I was gonna go into Tangent and Tangle. Uh, so, uh, we, we sort of discussed the overall stats that, uh, Shopify has.Uh, but, you know, I, I think some, some pretty cool stuff that you guys are working on is your ML experimentation, uh, and your, your sort of auto tr-research training pipeline. Presumably you're much closer to this one because it's, it's a sort of personal hobby of yours. How, how would you explain them in, together?I thought we have a slide that, like, uh, has the s- the system diagram.[00:18:24] Mikhail Parakhin: Yeah. Tangle first and then Tangent as a-[00:18:27] swyx: Yeah ...[00:18:28] Mikhail Parakhin: as a thing on top of Tangle. And, uh, Tangle is the third generation, I claim, of, uh, systems of, uh, running any data processing, but a bit with a skew for ML experiments, but not necessarily. Any sort of data processing tasks where you need to iterate, share, and you have scale so that you want maximum efficiency.You know how, like, normally you would work, you would-- Imagine you're a data scientist or an ML practitioner, you would get Jupiter notebooks or, or maybe you would get, uh, you know, Pyth- your Python scripts, and you would manage the data, and you produce those TSV files, and you put them in some JFS or something.Then you would notice that, oh, it has this, uh, weird missing values. You go and write another script that, uh, goes and replaces them with, uh-[00:19:20] swyx: Ah ...[00:19:21] Mikhail Parakhin: dash S. And then, then you, then you run some, some, uh, “Oh, I need to filter bots.” And so you run some light GBM model that, uh, removes the bots. And then, then you like-- And then you, you kind of like get into shape, and then you start experimenting, and you run multiple experiments, and then you're like, “Oh my God,” like, “this experiment is worse.”You undo, and you cannot get to previous result. And like, “Ah, what did I do?” Like that. Again, then, then you finally like get everything working. Then you like start throwing it over the fence to production. You, you replicate it, those things don't work, and then sometimes you like don't notice that you forgot some feature naming and the, the features don't match.But then, like imagine you, you did everything, and then six months later you're like, have to repeat it because now there's more data, or you wanted to do another pass, and you're like, “What, what did I do?” Or like, or like, “This script crashes now,” or the, “the path has changed.” And then, then you're trying to, like you spend another month just doing ar- digital archeology on your own, you know, history, right?Now multiply that by many, many teams. Now imagine you got an intern that you wanna ramp up. Now you have to show that intern, “Oh, you know, look, here's the folder, there's the scripts, you know, ask your cloud agent to do, and then, uh, to, to figure it out.” And then cloud agent does something, and then you're, “Ah, yeah, right, right, it was the wrong folder.I forgot to tell you, I actually have this other thing I forgot myself.” And, and that's, that's the, like, the daily life we all, uh, all know it, uh, if, if you're a data scientist, machine practitioner, ma- machine learning practitioner or, uh, or even like any data managing, uh, person.[00:21:00] swyx: Yeah. So I, I used to do this, uh, f- uh, on the quant finance side, uh, in, in my hedge fund.So we did this before Airflow, and then, uh, obviously Airflow came along and, uh, then more recently Dagster, uh, I would say is like, in my mind, what I would use for that shape of problem, uh, where you had to materialize assets and create a pipeline.[00:21:19] Mikhail Parakhin: And that's, that's very good segue because... So Airflow is great, but Airflow is more about you, you have something and you wanna repeatedly run it in production on schedule.It's less about you as a team developing things and being able to share, and you grabbing the standard pipeline and saying, “Hey, I wanna change this tiny little component in the huge sea of data processing, and I don't wanna-- I wanna run ten experiments on this, and I wanna do hyperparameter optimization.”All that is very hard to do with Airflow. It's very easy to do with Tango. Tango is m- more about, it's everything about group of people Running experiments, it might be agents too nowadays. Uh, running experiments cheaply, collaborating, sharing results. Uh, you don't need to understand fully. You, you grab-- you clone somebody else's experiment or somebody else's pipeline, uh, run, uh, change small piece, run it, be, like, get it to production state, and then ship in one click.So then the... You don't have to port it into any other system to, to run in production. You can just run the same experiment. It's, it's fully production ready. And, and it's, uh, it has lots of... Again, as I said, it's third generation system. The original one was, I would claim there was Ether and then, uh, at least in my career, Ether was the first, first, uh, that pioneered this type of approach.And then there was, uh, Nirvana, which, uh, uh, at Yandex, which did kind of sec-second take on this. And now this one aggregates the, the learnings from all of those and, and Airflow as well to, to get to the state where you try it, it, it feels kind of magical. Uh, ‘cause now everything is based on content, uh, hashes.So even if the version changed, but if the output didn't change, nothing is being rerun. It's very efficient. If you... Multiple people start experiment that needs the same sort of data preprocessing, it's not repeated multiple times. It's automatically done only once. If you start ten experiments that all require, you know, some, some data preparation first as the first step, and you don't have to coordinate for that.Like, you don't have to know that other people are starting it. You now, it's very easy compos-, uh, composability, any language you can u- uh, you wanna use, and it's very visual. So you can see immediately, you can edit it easily, you can assemble small things with just even mouse clicks if you want to, and, uh, share, clone.And everybody knows also it's fully kind of static in the sense that we rerun it second time, it will exactly have the same results. Like, you will never have to do digital archeology. So full versioning and everything is also there.[00:24:06] swyx: Uh, so, so people can, uh... It's open source. Go to the GitHub repo and, and, uh, check it out.Uh, and it is also a really good, uh, blog post about it. I think all these is, like, really appealing. The, the, the, the thing that I think sells me the most about it is that, um, sort of development to production transition, right? Which I think, um, a lot of people haven't really solved that, uh, strictly, right?Like, we develop really, really well in, in Python notebooks, but then, you know, that's obviously not a sort of production ready process. I think that, like, any way in which that is solved, I think is, is very appealing. Then the other thing that you mentioned, which also raised my eyebrows, was content-based caching, which you mentioned is, is, um, you know, is ve-very much, uh, um, a sort of efficiency measure about, uh, you know, just like recalculation only on, on sort of content addressing Which I think makes sense.Uh, it surprised me that the savings could be this much, but maybe I just haven't worked at your scale where there's so much duplication, uh, that people just rerun because they change a single ID upstream.[00:25:10] Mikhail Parakhin: It does, yeah. But it's not only you rerun. The, the main savings are coming from the fact that you ran it, you got your job done, and you moved on.Then- Yeah ... somebody else in some department you don't know existed runs the same task, but on a newer version.[00:25:27] swyx: Yeah.[00:25:27] Mikhail Parakhin: Like right now, you can't, in, in most of the organizations, you can't even find out about it so that you can't even measure that you're spending that time twice, right? Here- Yeah ... if everybody's on Tango, that's detected automatically and detected that the output is the same.And then for that person, all it looks like is like experiment just suddenly moved, jumped forward, right? Uh, uh- Yeah ... so that's because, because the, there's network effect of multiple people helping each other.[00:25:51] swyx: Yeah. This is one of those things where it's designed to be a platform from the beginning rather than an individual developer's tool from the beginning, right?And, and everything's gonna streams down from there. That is the sort of Tango, uh, orchestrator, and it's, it manages jobs. We've seen a few versions of this, and this is obviously, uh, uh, the sort of, uh, unique approaches that you guys have, have, uh, figured out. And then there's Tangent.[00:26:14] Mikhail Parakhin: Yeah. And Tangent is basically an automatic auto research loop that can help and kind of do your work for you.Uh- ... you know, uh, effectively, effectively, Andrej Karpathy recently popularized it with auto research. Yes. Remember he said like he was, uh, speed running this, uh... Yeah, uh, you know the story. The, here we're basically bringing the same capability into Tango so that, uh, the, uh, Tangent can analyze it. It's just an agent that can run multiple experiments, figure out what can be changed, and keep on rerunning it, keep on modifying until, uh, maximizing some goal, some loss function, whatever you need to, to achieve.And in general, I would say if you're not using auto research-like approach in whatever you do, like literally whatever you do, then you're missing out. We saw at Shopify that taking like a wildfire, anything where you can put measurements can be done dramatically better. Our-[00:27:19] swyx: Mm-hmm ...[00:27:20] Mikhail Parakhin: uh, speed of, uh, templatization HTML, uh, completely new UX tem- uh, templatization of, uh, reducing latency for liquid themes.Uh, we-- Our, uh, search, uh, recently we moved from It's hard even, uh, quote from eight hundred QPS to forty-two hundred QPS with the same quality just by pure optimizations and not a research loop that kept running and changing code in our index serve on the same number of machines, just increasing the throughput.We, we managed to improve the quality of gisting and machine learning process. Uh, you know, gisting is the prompt compression technique that[00:27:59] swyx: allows for[00:28:00] Mikhail Parakhin: lower latency and, and lower and, uh, actually higher quality slightly. So like literally whatever different walks of life, and it doesn't have to be AI related.Uh, we, we had a reduction in, uh, storage because the agents would go and find data sets that clearly are derivative, uh, and then you don't need to store things twice. You know, we, we, we found somewhat embarrassingly that it was one of the largest tables was hashing random IDs into another random ID, and we literally- Oofput only one. So it was translating, yeah, two random IDs hashed[00:28:36] swyx: into[00:28:37] Mikhail Parakhin: each. So, so[00:28:37] swyx: it has access to the code as well, so it can, it can check the, like what, what the hell is it doing?[00:28:42] Mikhail Parakhin: So there, there cou- it could be run in two levels. You, uh, you know, at the superficial level, it could just use ex-existing components and, uh, reshuffle them.Uh, you know, like you can grab- Yeah ... uh, XGBoost, and you can grab some, some Py- PyTorch module, and then can grab some, you know, grab another tools and, and combine them. At a deeper level, since Tangle is all sort of CLI based underneath you, every, every component is a wrapped really CLI, uh, call and a YAML file, it can analyze code and create new components and, and, uh, keep on iterating as well.So, so you can, you can both have quick modifications of existing t- uh, pipelines with the, with components that are already there pre-baked, or you can create new components, uh, and-[00:29:29] swyx: Yeah ...[00:29:29] Mikhail Parakhin: keep iterating on those. So auto research is, again, this is probably the, the thing I was excited the most in the last two months happening, and we see it taking like, like totally like a wildfire.Just, uh, everybody, every day, every... well, every day, every minute, I would, uh, have somebody Slack message saying, “Oh, look how much better I made it.” And, uh, it's all throughout the research.[00:29:53] swyx: Is this democratized in some way in, in the sense that like is it your ML, uh, engineers and researchers doing this, or is it your regular PMs and software engineers also have the ability to auto-- to use Tangent?[00:30:07] Mikhail Parakhin: This is an awesome question. Like, Tango in general and Tangent in particular are extremely democratizing. Like they- Yeah ... they are the main tools for- ‘Cause I don't[00:30:15] swyx: need the details.[00:30:16] Mikhail Parakhin: Yeah. Exactly. Initially used by ML and AI engineers, but then literally, as you said, PMs are like the highest user right now is one of PMs on our org, uh, Sartak and he was, he was number one by, by usage of, of this ‘cause they're just, uh, energetic and knowledgeable, and now it, it unlocks a lot of capability where you don't have to co-change code manually.[00:30:39] swyx: I mean, I mean, because it kind of cuts out the ML, ML engineer from the process because the, the, the PMs have the domain knowledge and the ability to think about, uh, from first principles about, okay, what, what results do I want? And they can-- they even have the access to the data that, that needs to go in.So it's like in some ways, like this is the magic black box that we've always wanted for, for training and, and for, uh, I guess, uh, uh, hill climbing, whatever.[00:31:04] Mikhail Parakhin: It's basically cloud code for your AI development- ... uh, situation, right? Like now, now you don't have to know exactly how algorithms work. You can just, uh, bring your domain knowledge and expertise and product knowledge and iterate within Tangent until you've gotten the results that you need.[00:31:21] swyx: In my previous roles, every time that someone has pitched AutoML, you know, I've always been like, “Uh, this is not, this is not gonna work. It's, you know, it's, it's always gonna be a flop.” Somehow it's working now. I mean, presumably the answer is now we have LLMs and it's good enough, right? It's, it's an emergent property that we can do auto research, but like, it doesn't feel that satisfying that how come we didn't do this before, right?Like we just did like parameter search and like, I don't know. That's maybe that's it.[00:31:48] Mikhail Parakhin: Yeah. Bayesian optimization and hyperparameter optimization was, was the one that, or facet of AutoML that was used very actively, which incidentally also built into, uh, Tango. But, you know, I know Patrice Simard very well, and, uh, he was such a, uh, such a proponent of AutoML, and he put, like literally spent careers trying to democratize it.Without LLMs, it just turned out to be very hard. Like it, you, you would have flexibility within certain narrow domain, but it was hard to wider scale, and now with LLMs suddenly it's like magic wand, and so suddenly everybody- ... is an AutoML expert.[00:32:28] swyx: Yeah, I, I think it's multiple things, right? Like I'm, I'm just gonna bring up the, the, the chart again, right?Like LLMs can do the monitoring very well. That is the very potentially unbounded, super unstructured. It can do the analysis very well, it can do the... Uh, and basically it is much more intelligence poured into every single step. Uh, there's maybe nothing structurally changed about AutoML, but this is just m-more intelligent and more unstructured.[00:32:53] Mikhail Parakhin: Exactly.[00:32:54] swyx: Any flaws that you've run into? Like everyone is like drinking the Kool-Aid, oh my God, time savings, uh, you know, performance improvements. Like what, what, uh, issues have you have, uh, come up?[00:33:06] Mikhail Parakhin: This is really cool. It's not a solution to all the world's problems for sure. The limitations are usually the ones I-- And this is where we get into a bit of a subjective territory.Uh, I can only share what I've, I've seen so far, and I'm sure the situation, uh, is changing, and, you know, maybe after I say it, like many people will reach out and say, “Hey, what about this?” And you don't know that, and then, then we'll be probably right. But what I've seen is auto research is very good at doing kind of obvious things that you don't have bandwidth to do or you didn't notice or maybe you're not aware of like the-- some standard practices.It is not good at doing something completely out of distribution, something that, you know, you have to think for, for multiple days, uh, and, and do something like none of this. So, so it's, uh, I, uh, set an experiment once, uh, on, on my sort of, uh, hobby thing, and I let it run for, uh, ended up, uh, several weeks run, uh, you know, it's like full production kind of scale, so it, you know, slow runs and, and it ex-- it performed in the end, uh, over four hundred experiments, and only one was successful.I'm like, “Okay, that's, that's good.” But-[00:34:18] swyx: But it saved time.[00:34:19] Mikhail Parakhin: Yeah, I saved time. Like it, it was the, that thing. Yeah, if I, if I were doing four hundred experiments myself, my betting average, as I said, would have been much higher, I'm sure. But also, first of all, it would take me like three years to do four hundred experiments.And, uh, I didn't have to do them. Like the machines were just, uh, the price of electricity did that. So, and I got one improvement, uh, that in, uh, my, my-- Honestly, when I was starting that experiment, my thinking was to go and show that, “Hey, Andre, maybe you just don't know how to optimize.” And I was super smart because in, in my pro-problem, it was optimized for many years, and it was like fully improved.Uh, and I didn't expect it, you know, auto research to find anything at all. Yet it did. So instead of making fun of Andre, I ended up, uh, a big, big supporter. Yeah, that's exactly the tweet. Yes.[00:35:10] swyx: You and Toby really, really go back and forth on-online a lot, which is really funny. Uh, think of it as, as an eval for the optimalness of the code it's running on.Uh, it's almost like it reminds me of like a Kolmogorov complexity thing, but, uh, I guess it's-- there's some optimal thing that you're trying to sort of reduce down to, I guess. Um, and so, so you, you, you know, you should congratulate yourself that you had, uh, you know, uh, ninety-nine percent, uh, optimality.[00:35:36] Mikhail Parakhin: Exactly, yeah. I think Andre really deserves a lot of credit for popularizing this approach. This is, uh, this is incredibly, I think, powerful and cool and You know, the, uh, even him, him just mentioning it led to a lot of gains in a lot of places in the industry, so we should be thankful.[00:35:56] swyx: Yeah. I think he also has a just...I don't know what it is. Like, um, you know, it, it is a simple self-contained project that people can take and apply to other things, which is, is, is one thing, but also just the name. Just like somehow no one, no one managed to call their thing auto research. It's just naming things is very important. I think that that is mostly, uh, our coverage of Tango and, and, uh, Tangents.I think obviously, you know, there's a lot of, uh, ML infra at, at Shopify that people can, uh, dive into. We're about to go into SimGym, but before I do that, any, any other sort of broader comments around this whole effort? Like where is it, where is it leading to?[00:36:36] Mikhail Parakhin: As a segue to SimGym, like all those things start composing strongly.And, uh, you could see a huge unlock when you can look at each one of the tools and, and you see, oh, they're extremely useful. Uh, Tango is useful by itself. Auto Research is useful by itself. SimGym is useful by itself. If you combine all three, you create like synergetic effect. I think that's why we wanted to even, uh, cover them today is because this is something that if you go back even, you know, five years ago, would've been unthinkable.Uh, replicating that, uh, would, would be either incredibly costly or impossible, right? With probably thousands of people are required.[00:37:20] swyx: Well, we have serverless human, uh, serverless intelligence, right? Like, uh, so yes, you do have thousands of hu-- of, of intelligences, not just, not humans. And that's, that's close enough, right?Even if they're not AGI, they're, they're close enough to do the, the task that you need them to do. And, and, you know, that's, there's plenty for, for a lot of routine work, knowledge work. Okay, let's get into SimGym. Um, this is one of those things I, I was surprised to see actually it's apparently your, uh, one of your most popular launches, and I think something that, uh, I think Sim AI, I think Yunjun Park, who did the Smallville thing, there's a very small cottage industry of people trying to do like the simulate customer thing.I think a lot of people maybe don't super trust this yet because they're like, well, obviously they would just do what you prompt them to do, right? But maybe just think, uh, tell us about the sort of inspiration or origin story.[00:38:10] Mikhail Parakhin: That's exactly actually the thing I wanted to cover, because if you don't have the historical data, all you can do is prompt a-agents in a vacuum, and they will do exactly what you prompt them to do.In fact, when I first proposed it, and this is a bit of, um, my brainchild initially, if I, I can boast, even Toby said like, “But wouldn't they, they just repeat what, what you tell them?” And, uh, but I'm like, “Yes, except Shopify has decades of history of how people made changes and what there is, uh, there, what it resulted in terms of sales.”So now what we can do is we can-- we have this... It's not, it's a noisy data. There's a small, usually websites, uh, you know, like things, things are never in isolation. It's almost never AB experiment. It's always AA experiment when there's has two meanings, but basically, you know, in different time you run two different things.But if you aggregate in general, uh, like everything together, and you apply, uh, denoising and collaborative filtering like approach, you can extract a very clear signal. And then you can optimize your agents. And that's why it took so long. It took almost a year of that optimization of just us sitting and fiddling, and, and we had this internal goals of correlation of hitting-- internal goal was to hit zero point seven correlation with, uh, add to cart events, for example.Like that, that if we run real AB test experiment, that it should, it should go and, and rep-uh, replicate, uh, same sort of success that, that humans had or lack thereof. And it, it took forever, and I don't think that's easily replicatable because, uh, like who else would have that data? You have to have this historic, you know, decades, uh, worth of data.And now, now the, like the other thing you need is in-infrastructure and the scale, right? Because, uh, w- again, what we found, uh, stat sig results, you need to run a lot of simulations, a lot of agents, and, and it's-- Those are expensive things. Like you're, you're making actions in the browser because you want a real friction.You want to, to be able to get the image like of what humans will see because you wanna, uh, detect effects like, “Hey, if I make my images larger, will I have more sales or l- uh, fewer sales?” And like usually people's intuition here, by the way, is that I increase my images, I will have more because they look nicer.You know, designers all look sparse and big images. Like usually your sales tank, right? But, but, uh, you know, from HTML, all the characters look the same only the, the size tag looks different, right? So it's very hard. So you have to take visual information, you have to run this in simulated browser environment on the big farm and, and of course, you have to have, uh, like very, very expensive model, good model with multi-model model.So all this it's-- is what's taken so long and, uh, to share my personal fail a little bit there, Sean, is like, you know, we always had this bias to-- for like large company bias. You know, we always, uh, whenever you-- we do, we're like, “Hey, we'll run an experiment,” right? We make, make a change, and we will run an experiment and then, uh, see, uh, see which one's better or like, “No, this is worse,” and most of them are worse, so you discard it and keep iterating, hill climbing.And we're like, “Oh, like smaller merchants, they cannot get stat sig results. They cannot really run experiments simply because, you know, in a week there would be not enough data for them.” So we thought from this perspective. What we didn't realize is that most people don't have A and B, they just have one thing, and they need suggestions of What A and B should be.So, uh, we first build this, hey, we run simulation on two separate teams and, and, uh, say, “Hey, which one is better?” We then morphed it into, and very recently just released it, when you have just your site, your theme, we run over it and we say, “Hey, here's what predicted values of, of, uh, uh, conversions are, and here's how we think you should modify it to increase your conversions.”And then circling back to what you started with, the proof is in the pudding. Like, if we are not correlating with reality, like, people will not be using it. And, uh, thankfully, we see literally every day more users than the previous day. So, so right now, uh, right now- It's working. Yeah. I'm-- Right now my problem is how to pay for it all because the so our major thing is how to optimize the LLMs, do distillation, how to run the headless browsers, uh, and handful browsers, uh, uh, cheaper so that we can accommodate the increase in traffic.[00:42:47] swyx: Yeah. I, I understand that you, uh, you published a lot of technical detail at GTC, so I was just gonna bring it up a little bit. I think s- was this in, in con-conjunction with some kind of GTC presentation? Or something like that, right?[00:42:59] Mikhail Parakhin: Well, we, yeah, we, we did it in several place, but yeah, we had the engineering- Yeahblog, uh, as well. Yeah.[00:43:05] swyx: Yeah. So you're running, uh, GPT OSS. Uh,[00:43:08] Mikhail Parakhin: the, this is an older version. You know, now we run multimodal model. But yeah- Yeah ... GPT OSS, we still run GPT OSS as well for[00:43:15] swyx: And then you have the VMs, and you also have browser-based. I really like this one where it you said, “It violates almost every assumption that standard LLM serving is designed for.”And then you had like, basically orders of magnitude differences between everything.[00:43:29] Mikhail Parakhin: Exactly. Which is, which, uh, which was, you know, a bit of a challenge to implement, like when, like even simple things. Uh, be- since it violates all the assumptions, for example, multi-instance GPUs, like MIGs don't work as well.But we needed, uh, to get MIG to work because, ‘cause otherwise it's way too expensive. And so we had to deal with the, yeah, with, uh, lots of infrastructure and, and, uh, work with, uh, uh, Fireworks and CentML, uh, you know, to help with optimizations and browser-based, as you mentioned. Yeah, like, takes a village.[00:44:04] swyx: Okay. So there's a lot of like, I guess, experimentation in the infrastructure so far, and you've published more or less what you have here. I guess I'm, I'm less familiar with CentML. I, I don't do, uh, that much work in this, this part of the stack. But why was it the sort of preferred instance platform?[00:44:22] Mikhail Parakhin: There are really three probably top companies. There used to be, uh, uh- Three top companies, uh, at least I was aware of that did, uh, LM optimization. You know, together Fireworks and Santa ML, not necessarily in that order. Santa ML recently got acquired by NVIDIA. Uh, what they did is if you have a model and you want to optimize it to a specific prof-- uh, profile of usage, uh, they would go and do it.And, uh, we work with, with those companies, uh, this was work particularly in with Santa ML and NVIDIA to get them the best possible results out of it. And, and sometimes you, you have to retune depending on, like sometimes you want the maximum throughput, sometimes you want minimal latency, sometimes you want like the cheapest, right?And, yeah, or some combination. And so yeah, these are people who would come and help you.[00:45:14] swyx: I see. I see. Yeah, yeah. I'm familiar with these people for the LLM, you know, autoregressive stack. But the other interesting category of these optimizers is also the diffusion people, whereas like Fel and, you know, uh, Pruna recently has come up a lot as well, which I think is like really underappreciated, uh, at least by myself, because I, I thought, oh, all the workload would be LLMs, but actually there's a lot of diffusion as well.[00:45:38] Mikhail Parakhin: Exactly.[00:45:38] swyx: There's a lot here, so I, I, I... it's, it's, uh, it's, it's, it's hard to cover. But I, I do think like people underappreciate the importance of customer simulation, basically. I think this is something that I'm candidly still getting to terms with. Uh, you know, uh, you also-- your team also like prepared this, like, really nice diagram.Uh, I, I assume this is AI generated.[00:46:00] Mikhail Parakhin: Yeah, it looks-[00:46:01] swyx: Maybe it's not.[00:46:01] Mikhail Parakhin: Yeah, it looks, uh, Gemini-ish. Yeah, but, uh, uh, honestly, I, I don't know where, where the hell they generated. It looks, look, uh, looks like it's, uh, Google. But the interesting part, John, that, that, uh, we haven't covered, but I, I wanted to mention is if your store had previous customers, rather than it's a new store, you're like new merchant just launching things, it helps tremendously in just correlation and forecast.Yeah, we take your previous, uh, customer's behavior, and we create agents that replicate those specific distribution of, of customers that you get, and then we a- we apply those to your changes, and then that, that raised raw, you know, the re-- uh, just correlation with the add to cart events or to-- with conversion or whatever it, it, it may be, uh, quite dramatically.So, uh, replicating humans in general seems like an interesting, cool challenge.[00:46:58] swyx: As a shareholder, I think this is the-- like if people are Shopify shareholders, they should really deeply understand this because this is basically the moat. The, the more you use Shopify, the more it will just automatically improve, right?Like you're, you're doing the job for them.[00:47:13] Mikhail Parakhin: Yeah, that's what we started with. Like, uh- ... uh, otherwise, if you're just a startup, I wouldn't do it if, uh, you know, if it was my startup because Without the data, it, yeah, as, as you said, it's, it's exactly the case that, uh, whatever you say in prompt, that's, that's what the agents will be doing.[00:47:30] swyx: The statistician in me wants to like really satisfy the sort of, um, statistical intuition, I guess. Um, to me it's kind of, uh, the, the word that comes to mind is, um, ergodicity. Uh, so let's say a, a customer takes this path, customer takes this path, customer takes this path, right? Um, the... In my mind, the way I explain it is like, okay, here, here's the ninety-five percentile, here's the five percentile, and here's the median, right?Um, but to me, what SimGym is potentially doing is that it can, uh, modify... It can sort of model the sort of in-between sort of journeys as well, that, that maybe are dependent on the previous states. This may be like a very RL-type conclusion where like basically the summary statistics, if you only did naive AB testing, you only have the, the statistics at, at, at a certain point, and you only judge based on the sort of overall summary statistics.But here you can actually model trajectories. Does that make sense? Or-[00:48:31] Mikhail Parakhin: That makes total sense because like, well, that, that makes even more sense that maybe even you realize bec- because-[00:48:38] swyx: Okay. Please,[00:48:38] Mikhail Parakhin: please. Yes ... we do-- Yeah. The, so internally, uh, we have this system, we talked about it briefly once at NeurIPS.We have a huge HSTU-based system that models the whole companies, uh, and their possible paths. And like- Yeah ... what you are, what you are showing, like actually at any point of time, you can either model the user's behavior or you mo- can also think about, uh, the whole merchant as a company, as the entity that acts in the world.You can model that as well. And then you can do, can do counterfactuals. In your graph, like in your blue graph, uh, if you're... Imagine in the center there, uh, somewhere in the middle, you would have an intervention. I give that person a coupon, or I don't know, I send a personal thank you card, or give a discount in some- somewhere.And then you can, uh, then you can do forward rollouts from that counterfactual. So what would have happened with that intervention or without the intervention? And you can even ch- change where that intervention, uh, in time can happen, right? Like some- where, where in this journey. So we, we do this at the Shopify scale for our merchants, and then if we notice that something that they can be fixing, like there's a strong counterfactual, like we have Shopify policy, they basically get a notification like, “Hey, we think your...something is wrong with your-” I don't know, Canadian sales. Like, uh, it looks like it's misconfigured. Here's what you need to do. Or do you think like, uh, you have to set up this campaign with these parameters? And we do that at the buyer level to literally offer discounts or cashback or, or things to buyers.So this is-- I'm getting very excited. Like this is my sort of area of, uh, interest, I guess, and, and hobby. But being able to m-model something complex as human beings or companies and model counterfactuals on it, where you can have interventions in the future and optimize when to make intervention, what kind inter-- uh, what kind of intervention to make.It's such an unlock that previously was completely impossible. Like the-- it was, it was always dreamed of, but never... Like how would you even simulate it without LLMs or HTUs? I think very, very exciting times.[00:50:59] swyx: I just wanted to, uh, to maybe illustrate this. I, I'm not the best illustrator, but I, I am a conceptual statistics guy.And y-you know, you cannot just do this. Like this is a dimensionality AB test doesn't do, right? Like, uh, because it doesn't have the, the, the change over time, uh, stochastic nature, uh, and it doesn't have the sort of contextual like... Here's all the context to this point. Um, okay, cool. Um, that's SimGym.You're, you're gonna burn a lot of tokens on this thing. But you're, you're one of the, the only scale platforms in the world that can, uh, that can do this across a huge variety of workloads, right? I'm even curious on a sort of human, uh, research level of like, well, do, does retail behave d-differently from like clothing sales?D-does that behave differently from electronic sales? I, I don't know. I don't know what else you guys... The Kardashian shoppers, do they differ from like people who buy, uh, I don't know, cars and, uh, whatever.[00:51:55] Mikhail Parakhin: Well, very different, and different sensitivities and different modes of, uh, shopping and, and different levels of what's important.Now, to-totally, you can do aggregations at, uh, at a store level. You can do aggregations at a different, uh, category level. I don't know if, uh, you know, for our statisticians among us, I couldn't believe, but we-- recently we're looking at it, and we had to bring back, uh, CRPs, you know, Chinese restaurant process.It's a, like, way of aggregating and, like, naturally grow clustering. So across... Specifically to answer questions that, uh, like you were just posing on how, how if, if buyers behave different categories. And I'm like, “I haven't seen CRP since two thousand and one.” It's[00:52:37] swyx: so What? It's so- What is... No, I haven't, I haven't seen this.No. This is not in my training. Uh,[00:52:44] Mikhail Parakhin: but, but yeah, it, uh, uh, it actually, like the, the-- there was a very popular kind of theory, popular neurips HTML circles in early two thousands, uh, kind of nice. And now, now it has practical applications, uh- Yeah ... that we were resurrecting.[00:53:03] swyx: Yeah, amazing. Uh, I, I can see, I can see how this is like a, uh, a fun job for you where you get to apply all these things.Um, yeah, yeah, so super cool. Super cool. So, okay, so, so anyone who, who knows what CRPs are and has always wanted to use them at work, uh, they should, they should definitely join Shopify. Okay, so w-we have a lot and but I, I'm, I'm being mindful of the time. I, I do wanted to, to sort of cover some other things.Um, I-I'll give you a choice, UCP or Liquid?[00:53:30] Mikhail Parakhin: Liquid. I think, I think on UCP, you know, like UCP is very important for us and, and it just we are-- UCP, we have a structured, uh, discussions, and you can read about them, and we have, uh, blog posts, and we have a big release this week, in fact, like with our catalog.Oh,[00:53:46] swyx: okay.[00:53:46] Mikhail Parakhin: Uh, yeah,[00:53:46] swyx: but- Le-I mean, we, we can, we can discuss the, the, the release briefly because we'll release this after the-- after it's already announced so whatever. There's a catalog that you guys are doing?[00:53:55] Mikhail Parakhin: Yeah. So we are, we are- Okay ... we are bringing in capabilities of a whole, uh, Shopify catalog.Basically, you now you can search for products, you can do lookups by specific ID, you can do bulk lookups when you need to bring m-multiple products. You don't need to know in ad-in advance what you're trying to show or to sell or check out. Like, you can now, you can now have this decided at, at runtime, and this big area for investment for us for both non-personalized and personalized searches, trying to provide basically a win-window into whole universe of products that are being sold everywhere in the world.And Shopify is really not exactly, but almost like a super set of any-anything being sold. Now we are bringing it into UCP and, uh, and, uh, identity linking is another big thing for us, uh, so that you, you can use, uh, like Google or whatever, whatever identity you have, uh, they're minimizing friction.[00:54:56] swyx: Yeah. So[00:54:57] Mikhail Parakhin: yeah, big release for us.But Liquid AI of course we never talk about, and the problem might be more, more aligned with what we d-discussed previously on this chat.[00:55:07] swyx: Sure. The main thing that everyone understands about Liquid is that it is inspired by Worm, and I still don't know why. I'm curious on your explanation. I think you, you, uh, you can make things very approachable.And also I think like what is the potential of like the, the level of efficiency that you get out of Liquid?[00:55:23] Mikhail Parakhin: You- we all familiar with transformer architectures. And, uh, for the longest time, there was a competing architecture, it's called the state space models. So, so Sams, uh, you know, Chris, Chris Reyes, one of the pioneers and, and lots of startups, uh, trying to make those realities.They have, uh, significant benefits being main being, uh, being much faster and, uh, lower footprint and not quadratic in length, you know, sort of, uh, linear in, in, uh, in your context length. But with state space models- They never quite made it. Like they're used-- They have, uh, certain niches when they thrive, their hybrid architectures are useful, but they never quite made it.And liquid neural networks are, you can think of them as a next step, like, uh, sort of, uh, state-space model square. It's non-transformer architecture that's more complicated than sta-state space and really difficult to code if you-- if I'm being honest. But it's, um, very efficient. It's, uh, subline-- sub, uh, quadratic in, in length of your context.Uh, it's very compact way to represent things, and that's a liquid AI company. They... Their goal is to productize it, and very often you have this need, uh, when you need to have long context and small model, and you want to have low latency. Like in general, it's basically on par with transformers, and if you do hybrids with transformers, it's, it's even better.That's why we at Shopify, when we tried multiple and we constantly try multiple models, multiple companies, we found that for small, particularly with low latency applications, when you have low latency and/or if you need longer context lengths, liquid was the best. And so we still use the whole zoo and always like obviously test and use everything, uh, every open source model and, you know, it feels l
In this episode of Longevity by Design, host Dr. Gil Blander sits down with Dr. Uri Alon, Professor at Weizmann Institute of Science. They explore a systems view of aging that treats longevity as a solvable model, not a grab bag of disconnected theories.Uri explains aging with a simple story: houses make garbage, trucks remove it, and the village has a threshold for how much damage it can handle. In the body, “garbage” can include damaged and senescent cells, “trucks” can include immune cleanup, and “houses” can include long-lived cells and stem cells that drift over time. The model links this balance to death, disease, and steady decline, and it helps predict which interventions actually change it.They also revisit the role of genes. Uri argues that lifespan looks closer to 50% heritable today after correcting for early, non-aging deaths in older datasets. The rest comes from the environment and biological noise, which regular sleep may help reduce.Guest-at-a-Glance
Zak Mir talks to Paul Emmitt, CEO of Powerhouse Energy, about recent progress at the company, including its pioneering integrated technology that converts non-recyclable waste into low-carbon energy, and its revenue-generating engineering consulting division. They discuss PHE's proven technology, its projects and newsflow drivers.Powerhouse Energy: Turning Non‑Recyclable Waste into Low‑Carbon EnergyPowerhouse Energy is positioning itself where two urgent problems intersect: mounting non‑recyclable waste and the need to decarbonise industrial energy. The company has developed an integrated waste‑to‑energy solution that converts difficult waste streams into useful products — syngas, hydrogen or power — using a robust, proven process based on a rotating kiln.What Powerhouse Energy offersAt its core, Powerhouse Energy provides a technology platform and engineering capability that turns problematic waste into real value. Rather than competing with recycling, it tackles the residual fraction that typically goes to landfill or is incinerated.Key outcomes the company targets: Production of syngas that can be used for power or as a chemical feedstock Hydrogen production from syngas On‑site power to reduce reliance on grid electricity or volatile natural gas Support for corporate decarbonisation and waste management strategies How the technology works — simple and provenPowerhouse Energy's process is a gasification system built around a rotating kiln. That configuration brings two advantages: Robustness — rotating kilns are a proven industrial technology with predictable behaviour and maintenance profiles. Flexibility — the system handles a wide range of non‑recyclable feedstocks and produces a controllable syngas stream for different downstream uses. The downstream equipment—gas cleanup, conditioning and conversion units—uses established industrial technology, which lowers technical risk for potential customers and investors.The Feedstock Testing Unit: the turning pointOne of the most important recent milestones was the completion of a feedstock testing unit (FTU) at the company's Brandon site in March last year. The FTU serves several crucial purposes: It lets potential clients bring their waste and see how it performs through the process. It provides a platform for ongoing R&D and for tuning the process to specific feedstocks. It acts as a commercial proof point that the technology works at scale prior to a full build‑out. As the CEO put it succinctly:"The technology is here for people to see and see work." Commercialisation pathway and project landscapePowerhouse Energy is working across several geographies and project types, from licensing and royalties to design‑build‑operate models where it retains control of delivery.Current project highlights and opportunities include: An Australia project operating under a licensing and royalty arrangement, where the client controls the timeline and delivery. A proposed design, build and operate facility at Balamina (planning application submitted), which would allow the company to drive the project and capture engineering revenues. Active discussions in the Middle East, Europe and the Caribbean for a mix of larger and smaller commercial units. Importantly, many enquiries are becoming tangible opportunities: in the past 12 months the company received roughly 50–60 enquiries and has around a dozen genuine opportunities under discussion. That pipeline is a positive signal of market interest.Why market valuation lags — and what will change itPowerhouse Energy's market capitalisation sits around £15 million, while the global waste‑to‑energy market runs into the billions. That gap reflects the typical early‑stage dynamic: technology proven at demonstration scale must cross the commercial threshold before valuations re‑rate.The decisive inflection point will be a signed contract for the first commercial unit. That could be: The Australia project concluding financial due diligence and executing the contract A smaller commercial unit (2.5–10 tonnes per day) sold to a customer to prove on‑site commercial operation Obtaining a first commercial order validates technology, starts revenue recognition and reduces perceived risk for future customers and investors. The company is open to creative commercial structures to overcome the "nobody wants to be first" problem, including moving or deploying the FTU to a partner site as a demonstration of commerciality.What to watch nextFor anyone tracking the company, the next major newsflow drivers are clear: Execution of the first commercial contract — Australia or a smaller pilot commercial unit. Planning approval and progression of the Balamina design‑build‑operate project. Closing one of the mid‑sized international opportunities in Europe, the Middle East or the Caribbean. Deployment options for the FTU to accelerate commercial proofs and client acceptance. Where Powerhouse Energy sits in the energy transitionThere is sometimes a binary narrative that only electrification and “100% renewable” solutions are green. In practice, the waste problem persists and requires pragmatic, low‑carbon solutions. Powerhouse Energy is not positioned as a replacement for recycling but as a complementary pathway: it deals with materials that would otherwise be landfilled or poorly controlled incinerated.By turning that residual waste into syngas or hydrogen, the company helps organisations meet waste management targets, reduce fossil fuel dependence for site power and advance decarbonisation strategies in industries where waste remains an intractable problem.SummaryPowerhouse Energy has moved from concept to demonstrable technology. With a functioning feedstock testing unit, a growing pipeline of enquiries and concrete projects under discussion, the company's next milestone is commercial traction — a signed first unit. When that happens, it will mark the transition from technology validation to revenue generation and should materially change market perception.For organisations wrestling with non‑recyclable waste and industrial decarbonisation, Powerhouse Energy offers a pragmatic, proven route to convert waste into useful, low‑carbon energy streams.
Outline00:00 – Intro04:43 – Life and background08:45 – Bell Labs13:42 – Inventing the negative feedback amplifier18:15 – Nyquist's landmark contributions20:43 – Regeneration theory27:10 – Frequency response32:03 – Cauchy's argument principle36:05 – The Nyquist criterion41:37 – Why is it so hard?45:27 – Robustness, margins, and practical aspects56:41 – Beyond the Nyquist criterion1:04:25 – Pitfalls and common misunderstandings1:07:00 – OutroLinksBrian Douglas's video: http://y2u.be/sof3meN96MAThe Idea Factory: https://en.wikipedia.org/wiki/The_Idea_FactoryInventing the Negative Feedback Amplifier: https://doi.org/10.1109/MSPEC.1977.6501721Johnson–Nyquist noise: https://doi.org/10.1103/PhysRev.32.110Nyquist sampling theorem: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theoremRegeneration theory: https://doi.org/10.1002/j.1538-7305.1932.tb02344.xGain and phase margins: https://en.wikipedia.org/wiki/Bode_plot#Gain_margin_and_phase_marginRouth–Hurwitz criterion: https://en.wikipedia.org/wiki/Routh%E2%80%93Hurwitz_stability_criterionÅström's lecture: https://archive.control.lth.se/media/Staff/KarlJohanAstrom/Lectures/ASMENyquistLecture2005.pdfScale-Relative Graphs: https://doi.org/10.1109/TAC.2023.3234016Support the showPodcast infoPodcast website: https://www.incontrolpodcast.com/Apple Podcasts: https://tinyurl.com/5n84j85jSpotify: https://tinyurl.com/4rwztj3cRSS: https://tinyurl.com/yc2fcv4yYoutube: https://tinyurl.com/bdbvhsj6Facebook: https://tinyurl.com/3z24yr43Twitter: https://twitter.com/IncontrolPInstagram: https://tinyurl.com/35cu4kr4Acknowledgments and sponsorsThis episode was supported by the National Centre of Competence in Research on «Dependable, ubiquitous automation» and the IFAC Activity fund. The podcast benefits from the help of an incredibly talented and passionate team. Special thanks to L. Seward, E. Cahard, F. Banis, F. Dörfler, J. Lygeros, ETH studio and mirrorlake . Music was composed by A New Element.
Matthew Carey, Co-Founder & CEO, Teradar, joined Grayson Brulte on The Road to Autonomy podcast to discuss the company's emergence from stealth with $150 million in funding and the creation of a brand-new category of terahertz (THz) sensors.The operational backbone of Teradar's strategy is a Terahertz Detection and Ranging (Rad-AR) approach that fills the gap between LiDAR and radar on the electromagnetic spectrum. By utilizing a modular architecture of Lego-like transmitter and receiver chips, the system provides the high-resolution point cloud typically associated with lidar while maintaining the all-weather robustness and velocity-sensing Doppler capabilities of radar. This solid-state design allows the sensor to be hidden behind vehicle bumpers or polymers, eliminating the need for bulky roof-mounted hardware.In the field, Teradar is rigorously applying its technology to solve the weather casino problem, proving the system's robustness in the heavy rain, snow, and dense fog of Boston. Unlike traditional vision or LiDAR systems that struggle with atmospheric particulates, Teradar's longer wavelengths can bend around rain and dust, ensuring consistent performance in environments where humans or other sensors might fail.Teradar's Physical AI ecosystem also includes a defense-grade application that provides situational awareness in combat environments without being easily detected. The atmosphere effectively blocks the sensor's signal beyond its intended range, allowing it to operate in dense traffic or military zones without jamming other sensors or revealing a vehicle's position to hostile actors.Looking ahead, Matt envisions a future where high-performance sensing reaches a mass-market inflection point by becoming affordable enough for every vehicle, from a Mercedes S-Class to a Ford Focus. By partnering with Tier 1 suppliers rather than vertically integrating, Teradar aims to scale to millions of units, fundamentally transforming the industry by delivering a sensor stack that costs hundreds, not thousands of dollars.Episode Chapters00:00 Teradar Emerges from Stealth03:01 Limitations of Existing Sensor Technologies05:54 Introducing Terahertz Sensing08:00 Defense and Battlefield Applications11:11 Modular Sensor Architecture17:00 Early Development and Startup Challenges26:54 Why Teradar Chose Boston36:11 Autonomous Vehicles and Weather46:06 Scaling Teradar--------About The Road to AutonomyThe Road to Autonomy is the definitive media brand covering the Autonomy Economy™. Through our podcasts, newsletter, and proprietary market intelligence, we set the narrative for institutional investors, industry executives, and policymakers navigating the convergence of automation, autonomy, and economic growth.Join institutional investors and industry leaders who read This Week in The Autonomy Economy every Sunday. Each edition delivers exclusive insight and commentary on the autonomy economy, helping you stay ahead of what's next. Subscribe today for free: https://www.roadtoautonomy.com/ae/See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Welcome to the In The Money Podcast by Zerodha, where we bring you the sharpest minds from around the world in trading, markets, and active investing. In this episode, we sit down with Rob Carver, systematic futures trader, former hedge fund manager, and one of the most practical voices in quantitative and rules-based investing. Rob spent years building and managing systematic macro and fixed income strategies at AHL, one of the world's largest systematic hedge funds, after earlier roles in derivatives trading at Barclays and economic research at CEPR. Today, he trades his own capital, writes extensively on systematic investing, and teaches systematic trading strategies to finance students. In this wide-ranging conversation, Rob shares how real market understanding develops slowly over time, why trading your own money feels very different from managing institutional capital, and why starting simple is still the best advice for most traders. We dive deep into trend following, mean reversion, risk management, diversification, drawdowns, and the dangers of over-optimizing systems. We also discuss how the post-GFC world changed trend behavior, how to evaluate strategy performance beyond simple metrics, and Rob's strong views on AI, LLMs, and automation in trading systems. You can check out our newsletter here: https://inthemoneybyzerodha.substack.com/
Simple math suggests that small groups of scientists can significantly bias peer review.
Balancing Fitness and Robustness: The Importance of Neuromuscular Coordination in Training In this episode of the On Coaching Podcast, Steve Magness and Jon Marcus discuss the concept of ‘fit but flat,’ exploring the phenomenon where athletes excel in metabolic fitness but fail to perform competitively due to a lack of neuromuscular coordination. Using examples like…
Today, I'm joined by the inspiring Dr. Jeffrey Gladden—a trailblazer in longevity medicine who once spent decades as an interventional cardiologist, only to challenge everything he knew after his own health hit a wall. Dr. Gladden opens up about the moment he refused to accept "normal for your age" as a diagnosis, launching himself into the world of functional and age-management medicine to reclaim his vitality and help others do the same. Episode Timestamps: Welcome and episode introduction ... 00:00:00 Health crisis and discovering personal optimization ... 00:07:05 From "sick care" to health optimization ... 00:10:46 Vision for personalized, youthful longevity ... 00:12:17 Personalized medicine: why one size doesn't fit all ... 00:16:00 Linear versus exponential aging; fixing a flawed approach ... 00:18:02 Five circles of exponential health: key longevity domains ... 00:19:23 Curiosity, growth mindset, and quantum thinking in longevity ... 00:22:22 Why individualization is crucial for diet and interventions ... 00:28:52 Insulin resistance: the hidden driver of aging ... 00:33:41 Environmental and internal (psychospiritual) factors in health ... 00:38:40 Healing through meditation, stress management, and flow ... 00:41:15 Robustness, resilience, and anti-fragility as longevity superpowers ... 00:57:09 Safe, personalized hormone therapy and the importance of tracking ... 01:03:33 Integrating mindset, purpose, and psycho-spiritual work ... 01:08:50 Peptides and advanced therapies: preparing for optimal results ... 01:09:56 Common test misconceptions in longevity medicine ... 01:12:56 Debunking the myth of single biological age ... 01:16:38 Resources, connect with Dr. Gladden, and closing ... 01:18:09 Our Amazing Sponsors: Youth Daily by Young Goose — An all-in-one moisturizer powered by NAD+ nano precursors to boost elasticity, smooth wrinkles, and keep your skin looking fresh, dewy, and full of life; grab yours at younggoose.com and use code Nat10 for first orders or 5NAT for returning customers. Quantum Upgrade - Supports nervous system balance without wearables or apps—just effortless, 24/7 quantum energy streaming. With 21+ studies showing measurable improvements in stress and cellular function, it's easy to try for yourself. Visit quantumupgrade.io/NAT and use code NAT10 to start the free trial. Mitopure®️ Longevity Gummies by Timeline — Clinically backed Urolithin A supports mitochondrial health to boost energy, recovery, and healthy aging, all in an easy daily gummy instead of another pill; go to timeline.com/nat20 for 20% off Mitopure®️ Gummies. Nat's Links: YouTube Channel Join My Membership Community Sign up for My Newsletter Instagram Facebook Group
In this episode, I explore the idea of robustness—what it means to stay steady, rooted, and responsive in the face of challenge. From a seasonal and herbal perspective, robustness is a measure of health: not the absence of difficulty, but our capacity to meet life as it is without losing our center.Winter is a time that naturally asks us to slow down, turn inward, and build reserves. Drawing on lessons from nature and community, I reflect on how resilience is shaped through challenge, vulnerability, and connection rather than isolation or hardening.In this conversation, I talk about how robustness can be cultivated through simple practices, relationships, and plant medicine—especially during winter, when our systems need extra support.In this episode, I explore:- What robustness is and why it matters for mental and physical health- How resilience is built by meeting challenge rather than avoiding it- Lessons from nature about strength, adaptability, and seasonal cycles- Community and co-regulation as foundations of collective robustness- Quiet winter practices like candle gazing to help organize thoughts- Singing, humming, and vocal toning to calm the nervous system- Staying connected to nature as a source of steadiness and strengthHerbal allies I discuss for supporting robustness and resilience include:- Licorice root – a deeply nourishing adaptogen for building reserves- Motherwort – a comforting herb for the heart during distressI also touch on nourishment beyond herbs, including sunflower oil, and how fats and oils support the skin, nervous system, and overall vitality during the colder months.This episode is an invitation to think about robustness not as pushing through or toughening up, but as the ability to remain present, connected, and supported—especially in times that feel uncertain or demanding.
Daniel Lam discusses the continued robust inflow into EM debt, why we prefer EM debt to DM debt in 2026, and the investment implications.Speaker: - Daniel Lam, Head of Equity Strategy, Standard Chartered BankFor more of our latest market insights, visit Market views on-the-go or subscribe to Standard Chartered Wealth Insights on YouTube.
Outline00:00 - Intro03:18 - Early days: why control, M. Athans, and IDSS12:21 - What is gain scheduling?33:37 - Paradigm shifts & the ‘90s: Minnesota → Texas → LA 42:19 - Robustness & fundamental limitations of nonlinear systems57:35 - Set-valued control & estimation01:04:52 - Game theory & multi-agent control 01:28:18 - Learning & dissipativity in games & multi agent AI01:45:03 - KAUST: building something new01:53:33 - On human-algorithmic interaction01:59:07 - Advice to future students: control, jiu-jitsu, and chatbots in education2:10:02 - OutroLinksJeff's website: https://tinyurl.com/52btmmz7CSM interview: https://tinyurl.com/49wh98x7Domain: feedbackcontrol.comM. Athans: https://tinyurl.com/nhbw66waPhD thesis: https://tinyurl.com/5eyxkfm6IDSS: https://tinyurl.com/bdenwy6dResearch on gain scheduling: https://tinyurl.com/55se8zcr Overview of LPV systems: https://tinyurl.com/3ksff58b Åström's lecture: https://tinyurl.com/33mxkkfe Necessity of the small gain theorem: https://tinyurl.com/mjn9eeb4 Sensitivity reduction for nonlinear plants: https://tinyurl.com/23tej5ypRespect the unstable: https://tinyurl.com/3yww5eds Differential inclusion: https://tinyurl.com/4yvc8vccLectures on game theory: https://tinyurl.com/4z8hh3rnDynamic fictitious play: https://tinyurl.com/yc6wsxjjCooperative control and potential games: https://tinyurl.com/4hbmrt72Dissipativity theory in game theory: https://tinyurl.com/3theyc7xPopulation games, stable games, and passivity: https://tinyurl.com/zxwtzv6wGame theory and control: https://tinyurl.com/yencrwm3Higher-order uncoupled learning: https://tinyurl.com/37Support the showPodcast infoPodcast website: https://www.incontrolpodcast.com/Apple Podcasts: https://tinyurl.com/5n84j85jSpotify: https://tinyurl.com/4rwztj3cRSS: https://tinyurl.com/yc2fcv4yYoutube: https://tinyurl.com/bdbvhsj6Facebook: https://tinyurl.com/3z24yr43Twitter: https://twitter.com/IncontrolPInstagram: https://tinyurl.com/35cu4kr4Acknowledgments and sponsorsThis episode was supported by the National Centre of Competence in Research on «Dependable, ubiquitous automation» and the IFAC Activity fund. The podcast benefits from the help of an incredibly talented and passionate team. Special thanks to L. Seward, E. Cahard, F. Banis, F. Dörfler, J. Lygeros, ETH studio and mirrorlake . Music was composed by A New Element.
Episode 83 of The Space Industry podcast by satsearch is a conversation with Adrian Helwig, Analog Field Application Engineer, and Michael Seidl, Systems Engineer from Texas Instruments (TI), about designing space systems with integrated Fault Detection, Isolation, and Recovery (FDIR) strategies.TI is a global electronics manufacturer with a wide portfolio of space-grade components to support space missions across the spectrum.In the episode, Adrian, Michael and satsearch COO Narayan Prasad Nagendra discuss:FDIR as a complex, critical sequence in space system design: Since equipment in space cannot be manually repaired, systems must quickly and reliably detect faults, isolate the damaged unit (e.g., by switching it off), and recover mission operations, often by engaging a redundant unit.Trade-off between reliability, performance, and cost: Engineers face this trade-off particularly when selecting components that must withstand extreme environments (radiation, temperature cycles) and long missions (LEO vs. GEO/Deep Space). Using non-space-grade parts introduces significant risk and defeats the purpose of FDIR.Effective fault containment based on integrated, smart strategies: Strategies that avoid complexity, using methods like galvanic isolation, fast load switches, and highly-integrated space-grade components that incorporate diagnostics and can execute complex decision-making based on multiple sensor inputs (voltage, current, temperature) prevent fault propagation.You can find out more about TI on their satsearch supplier hub. And if you would like to learn more about the space industry and our work at satsearch building the global space supply chain, please take a look at our blog.[Music from Uppbeat (free for Creators!): https://uppbeat.io/t/all-good-folks/when-we-get-there License code: Y4KZEAESHXDHNYRA]
In this episode of Longevity by Design, host Dr. Gil Blander sits down with Dr. Albert-László Barabási, Professor at Northeastern University, to explore how networks shape health, aging, and nutrition. Barabási explains how biological and social networks influence resilience, robustness, and our ability to recover from stress or disease. He describes aging as a gradual loss of resilience, where the body becomes less able to bounce back from small disruptions.The conversation moves into the world of nutrition, where Barabási introduces the concept of “nutritional dark matter.” He argues that food contains thousands of little-known molecules, many of which play key roles in health but remain largely unmapped and unstudied. Barabási breaks down how these compounds, especially those found in plants, support cellular function far beyond the traditional nutrients listed on food labels.The episode closes with a look at ultra-processed foods and their link to disease risk. Barabási shares new research tools that can help people evaluate what they eat and make smarter choices. Throughout, he reminds listeners that strong connections, between cells, foods, and people, are at the heart of long, healthy lives. Guest-at-a-Glance
Steve Perkins – Steve talks about robustness in the face of stressful situations
In part two of his conversation, Illia Polosukhin, co-author of "Attention Is All You Need" and founder of NEAR Protocol, explores the profound societal and economic implications of AI. He envisions a future where a unified intelligence layer transforms personal computing, AI agents enable direct market connections, and individuals find meaning in niche communities amidst AI-driven abundance. Illia also delves into the ethics of agent-to-agent interactions, where AIs prioritize human interests, and new governance models utilizing AI delegates. This episode offers a concrete, systems-level vision for how AI will reshape our world, from daily life to global governance. Sponsors: Google Gemini Notebook LM: Notebook LM is an AI-first tool that helps you make sense of complex information. Upload your documents and it instantly becomes a personal expert, helping you uncover insights and brainstorm new ideas at https://notebooklm.google.com Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai Linear: Linear is the system for modern product development. Nearly every AI company you've heard of is using Linear to build products. Get 6 months of Linear Business for free at: https://linear.app/tcr Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive PRODUCED BY: https://aipodcast.ing CHAPTERS: (00:00) Sponsor: Google Gemini Notebook LM (00:31) About the Episode (03:33) AI Transforms Software Development (14:18) The Future of Work (18:58) Securing Blockchain with AI (Part 1) (19:08) Sponsors: Tasklet | Linear (21:48) Securing Blockchain with AI (Part 2) (33:03) Vision for an AI Society (Part 1) (33:55) Sponsor: Shopify (35:52) Vision for an AI Society (Part 2) (49:14) Agent Architecture and Alignment (58:30) Experimenting with AI Governance (01:06:43) AI Safety and Robustness (01:16:09) Bio-Security and Open Models (01:22:56) Coordinating AI Development (01:28:52) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk
Sanjoy Chowdhury reveals AI's hidden weakness: while systems can see objects and hear sounds perfectly, they can't reason across senses like humans do. His research at University of Maryland College Park, including the Meerkat model and AVTrustBench, exposes why AI recognizes worried faces and thunder separately but fails to connect them—and what this means for self-driving cars and medical AI. Sponsors This episode is proudly sponsored by Amethix Technologies. At the intersection of ethics and engineering, Amethix creates AI systems that don't just function—they adapt, learn, and serve. With a focus on dual-use innovation, Amethix is shaping a future where intelligent machines extend human capability, not replace it. Discover more at https://amethix.com This episode is brought to you by Intrepid AI. From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence. Whether it's in the sky, on the ground, or in orbit—if it's intelligent and mobile, Intrepid helps you build it. Learn more at intrepid.ai Resources: The first audio-visual LLM with fine-grained understanding: Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time (Accepted at ECCV 2024) Benchmark for evaluating the robustness to adversarial attacks, compositional reasoning: AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs (Accepted at ICCV 2025) First audio-visual reasoning evaluation benchmark and test time reasoning distillation pipeline AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs Accepted at ICCV 2025 For a detailed list of Sanjoy's work, please visit his webpage: https://schowdhury671.github.io/
Welcome to the Mind Muscle Connection Podcast!In today's episode, I sit down with Rob Wilson to talk about Robustness, Anti Fragility, Breath Training, Nasal Breathing. Rob is a human performance specialist with over 20 years of experience working across manual therapy, coaching, and stress management and the author of Check Engine Light.We get into performance coaching for Navy SEALs and elite athletes, what stress management really means beyond the buzzword, and why building robustness and resilience is key for both health and performance. Rob also shares how breathwork can be applied in real-world training and recovery, along with practical tips to help you get started.If you've ever wondered how to better manage stress, improve performance, and get more out of your training and recovery, this conversation is a must-listen.Let's talk about:Introduction to Rob WilsonStress management vs. load managementRobustness, resilience, and antifragilityBreathwork basics & poor breathingNasal breathingTips to start breath trainingWhere to find RobFollow Rob Wilson on Instagram:https://www.instagram.com/thecheckenginelight/Breathing Articleshttps://simplifaster.com/articles/author/robwilson/Bookthecheckenginelightbook.comFollow me on Instagram for more information and education: jeffhoehn_FREE 30 Min Strategy Call: HEREBody Recomp Masterclass: HERENutrition Periodization Masterclass: HEREHow You Can Work With Me?: HERECoaching application: HEREBody Recomp Checklist 2.0: https://chipper-producer-6244.kit.com/26b5c9f94a
Shane Rosenthal and Simon Hamp from the NativePHP Project have brought PHP, and with it my favorite web framework Laravel, onto Mobile devices. I love this: taking established tech and porting it into places where you wouldn't expect. I'll be talking to Share and Simon about how they accomplished this, and, maybe even more impressively, how they turned this into a profitable business at a very early stage.This episode of The Bootstraped Founder is sponsored by Paddle.comThe blog post: https://thebootstrappedfounder.com/simon-hamp-shane-rosenthal-building-monetizing-nativephp/ The podcast episode: https://tbf.fm/episodes/399-simon-hamp-shane-rosenthal-building-monetizing-nativephpCheck out Podscan, the Podcast database that transcribes every podcast episode out there minutes after it gets released: https://podscan.fmSend me a voicemail on Podline: https://podline.fm/arvidYou'll find my weekly article on my blog: https://thebootstrappedfounder.comPodcast: https://thebootstrappedfounder.com/podcastNewsletter: https://thebootstrappedfounder.com/newsletterMy book Zero to Sold: https://zerotosold.com/My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/My course Find Your Following: https://findyourfollowing.comHere are a few tools I use. Using my affiliate links will support my work at no additional cost to you.- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw
Support the show to get full episodes, full archive, and join the Discord community. The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives, written by journalists and scientists. Read more about our partnership. Check out this story: What, if anything, makes mood fundamentally different from memory? Sign up for Brain Inspired email alerts to be notified every time a new Brain Inspired episode is released. To explore more neuroscience news and perspectives, visit thetransmitter.org. Elusive Cures: Why Neuroscience Hasn't Solved Brain Disorders―and How We Can Change That. Nicole Rust runs the Visual Memory laboratory at the University of Pennsylvania. Her interests have expanded now to include mood and feelings, as you'll hear. And she wrote this book, which contains a plethora of ideas about how we can pave a way forward in neuroscience to help treat mental and brain disorders. We talk about a small plethora of those ideas from her book. which also contains the story partially which will hear of her own journey in thinking about these things from working early on in visual neuroscience to where she is now. Nicole's website. Elusive Cures: Why Neuroscience Hasn't Solved Brain Disorders―and How We Can Change That. 0:00 - Intro 6:12 - Nicole's path 19:25 - The grand plan 25:18 - Robustness and fragility 39:15 - Mood 49:25 - Model everything! 56:26 - Epistemic iteration 1:06:50 - Can we standardize mood? 1:10:36 - Perspective neuroscience 1:20:12 - William Wimsatt 1:25:40 - Consciousness
Part 2 of this series could have easily been renamed "AI for science: The expert's guide to practical machine learning.” We continue our discussion with Christoph Molnar and Timo Freiesleben to look at how scientists can apply supervised machine learning techniques from the previous episode into their research.Introduction to supervised ML for science (0:00) Welcome back to Christoph Molnar and Timo Freiesleben, co-authors of “Supervised Machine Learning for Science: How to Stop Worrying and Love Your Black Box”The model as the expert? (1:00)Evaluation metrics have profound downstream effects on all modeling decisionsData augmentation offers a simple yet powerful way to incorporate domain knowledgeDomain expertise is often undervalued in data science despite being crucialMeasuring causality: Metrics and blind spots (10:10)Causality approaches in ML range from exploring associations to inferring treatment effectsConnecting models to scientific understanding (18:00)Interpretation methods must stay within realistic data distributions to yield meaningful insightsRobustness across distribution shifts (26:40)Robustness requires understanding what distribution shifts affect your modelPre-trained models and transfer learning provide promising paths to more robust scientific MLReproducibility challenges in ML and science (35:00)Reproducibility challenges differ between traditional science and machine learningGo back to listen to part one of this series for the conceptual foundations that support these practical applications.Check out Christoph and Timo's book “Supervised Machine Learning for Science: How to Stop Worrying and Love Your Black Box” available online now.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
// SPONSORS //The Farm at Okefenokee: https://okefarm.com/iCoin: https://icointechnology.com/breedloveHeart and Soil Supplements (use discount code BREEDLOVE): https://heartandsoil.co/In Wolf's Clothing: https://wolfnyc.com/Blockware Solutions: https://mining.blockwaresolutions.com/breedloveOn Ramp: https://onrampbitcoin.com/?grsf=breedloveMindlab Pro: https://www.mindlabpro.com/breedloveCoinbits: https://coinbits.app/breedlove // PRODUCTS I ENDORSE //Protect your mobile phone from SIM swap attacks: https://www.efani.com/breedloveNoble Protein (discount code BREEDLOVE for 15% off): https://nobleorigins.com/Lineage Provisions (use discount code BREEDLOVE): https://lineageprovisions.com/?ref=breedlove_22Colorado Craft Beef (use discount code BREEDLOVE): https://coloradocraftbeef.com/ // SUBSCRIBE TO THE CLIPS CHANNEL //https://www.youtube.com/@robertbreedloveclips2996/videos // OUTLINE //0:00 - WiM Episode Trailer1:22 - Nassim Taleb's “Anti-Fragile”11:36 - Why Anti-Fragile?19:15 - The Farm at Okefenokee20:25 - iCoin Bitcoin Wallet21:55 - The Central Triad: Fragile, Robust, and Anti-Fragile26:45 - Learning31:05 - Skin in the Game: Decisions and Risk36:28 - Heart and Soil Supplements37:28 - Helping Lightning Startups with In Wolf's Clothing38:20 - Skin in the Game and Anti-Fragility41:34 - Academia and WTF Happened in 197147:54 - Central Triad Explained57:43 - Mine Bitcoin with Blockware Solutions59:05 - OnRamp Bitcoin Custody1:00:28 - Modern Problems: Health, Big Pharma, Raising Children 1:12:28 - Surviving Scarcity and Weimar Hyperinflation1:18:17 - Fragile vs Anti-Fragile Systems (Centralization vs Free Markets)1:20:44 - Mind Lab Pro Supplements1:21:53 - Buy Bitcoin with Coinbits1:23:22 - Hormesis and Iatrogenics: The Benefits of Adversity1:29:46 - Money Printing and Anti-Fragility1:32:54 - Alcoholism and Money Printing1:36:22 - Have Agency: Humans are Programmable1:40:20 - Closing Thoughts // PODCAST //Podcast Website: https://whatismoneypodcast.com/Apple Podcast: https://podcasts.apple.com/us/podcast/the-what-is-money-show/id1541404400Spotify: https://open.spotify.com/show/25LPvm8EewBGyfQQ1abIsERSS Feed: https://feeds.simplecast.com/MLdpYXYI // SUPPORT THIS CHANNEL //Bitcoin: 3D1gfxKZKMtfWaD1bkwiR6JsDzu6e9bZQ7Sats via Strike: https://strike.me/breedlove22Dollars via Paypal: https://www.paypal.com/paypalme/RBreedloveDollars via Venmo: https://account.venmo.com/u/Robert-Breedlove-2
Featured Attorney PanelPeter Fitzgerald - Partner at Angelina and Herrick www.ah-lawyers.com/William Lohman - Head of Real Estate Law @ Castle Law www.castlelaw.comMike Baradt - Law Offices of Michael Bradt https://mbradtlaw.com/The Multi-Board Residential Real Estate Contract has been updated to version 8.0. Stay ahead of the curve in this session that will provide an in-depth overview on the changes to the new contract - equipping you with the knowledge and tools to navigate these changes confidently and effectively.This powerful session provides a comprehensive overview of the new contract's key elements, reformatted structure, legal considerations and potential impact on transactions.And, our expert panel will guide you through real-world scenarios, practical tips and critical updates to help you master the revisions and minimize risks for your clients and business. A must-watch session for all REALTORS.Introduction and Welcome (00:00:00)Panelist Introductions (00:01:39)History of the 80 Contract (00:04:07)Feedback on Contract Changes (00:09:00)Key Enhancements to the 80 Contract (00:10:58)Impact of Organizational Changes (00:12:10 Transparency in Broker Commissions (00:14:21)Understanding Property Types (00:15:10)Robustness of Buyer Broker Agreements (00:16:24)Value of Buyer Agents (00:17:14)Expanded Personal Property Paragraph (00:18:03)Impact of Solar Panels (00:18:39)Clarification of Notice Options (00:20:42)Homeowner and Flood Insurance Changes (00:21:33)As Is Clause Clarification (00:24:29)Concerns with As Is Paragraph (00:25:49)Health and Safety Exclusions (00:28:01)Negotiating As Is Transactions (00:30:50)Risks of Making Requests (00:33:04)Opportunities for Buyers (00:34:53)Investor Strategies (00:35:14)Offer Games (00:36:09)Expectations in Real Estate (00:37:18)Inspection Paragraphs (00:38:07)Knowledge Representation Changes (00:39:00)Escrow Language Changes (00:41:08)Performance Clause Expansion (00:42:56)Cancellation of Prior Contracts (00:44:17)Alternative Energy Paragraph (00:48:24)Inconsistent Representation Language (00:49:43)The Cost of Solar Panels (00:52:08)Seller's Preparation (00:52:34)Contract Addendum Overview (00:52:53)Reverse Contingency Addendum (00:53:05)Repair Addendum Insights (00:53:40)Organizing the Addendum List (00:53:54)Notice and Response Form (00:54:18)Appraisal Addendum Discussion (00:54:29)Broker's Role in Transactions (00:56:47)Intent to Escalate (00:58:42)Multi-Unit Addendum (01:00:00)Repair Request Agreement (01:00:20)Reverse Contingency in Seller's Market (01:01:11)Introduction of Land Trust Team (01:02:14)Real Estate Fraud Session Follow-up (01:03:09)Closing Remarks (01:03:17)Escrow Discussion (01:04:20)Valuing Time in Escrow Management (01:05:38)Final Goodbyes (01:06:21)
Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.Max Bartolo (Cohere):https://www.maxbartolo.com/https://cohere.com/commandTRANSCRIPT:https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0TOC:1. Model Reasoning and Verification [00:00:00] 1.1 Model Consistency and Reasoning Verification [00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis [00:10:28] 1.3 AI Application Development and Model Deployment [00:14:24] 1.4 AI Alignment and Human Feedback Limitations2. Evaluation and Bias Assessment [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior [00:32:43] 2.3 Adversarial Examples and Model Robustness3. Benchmarking Systems and Methods [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics [00:50:33] 3.3 Evolution of Model Benchmarking Methods [00:51:15] 3.4 Hierarchical Capability Testing Framework [00:52:35] 3.5 Benchmark Platforms and Tools4. Model Architecture and Performance [00:55:15] 4.1 Cohere's Model Development Process [01:00:26] 4.2 Model Quantization and Performance Evaluation [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards [01:08:27] 4.4 Training Progression and Technical Challenges5. Future Directions and Challenges [01:13:48] 5.1 Context Window Evolution and Trade-offs [01:22:47] 5.2 Enterprise Applications and Future ChallengesREFS:[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20[00:04:15] Influence functions in machine learning, Koh & Lianghttps://arxiv.org/abs/1703.04730[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmannhttps://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AIhttps://huggingface.co/CohereForAI/c4ai-command-a-03-2025[00:13:30] OpenInterpreterhttps://github.com/KillianLucas/open-interpreter[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsomhttps://arxiv.org/abs/2309.16349[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.https://arxiv.org/abs/2404.16019[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madryhttps://arxiv.org/abs/1905.02175[00:43:00] DynaBench platform paper, Douwe Kiela et al.https://aclanthology.org/2021.naacl-main.324.pdf[00:50:15] Sara Hooker's work on compute limitations, Sara Hookerhttps://arxiv.org/html/2407.05694v1[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.https://arxiv.org/abs/2207.10062[01:04:35] DROP, Dheeru Dua et al.https://arxiv.org/abs/1903.00161[01:07:05] GSM8k, Cobbe et al.https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k[01:09:30] ARC, François Chollethttps://github.com/fchollet/ARC-AGI[01:15:50] Command A, Coherehttps://cohere.com/blog/command-a[01:22:55] Enterprise search using LLMs, Coherehttps://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers
Man in his pomp yet without understanding is like the beasts that perish.- Psalm 49:20 This Episode's Links and Timestamps:00:00 – Scripture Reading02:59 – Introduction10:41 – Commentary on Psalm 4934:06 – Reviewing ‘Antifragile' by Nassim Taleb1:12:12 – Robustness and Antifragility1:22:06 – Interventionism and Iatrogenics1:29:36 – Ethics and Foundational Asymmetry
Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.Goto https://tufalabs.ai/***Jan Disselhoffhttps://www.linkedin.com/in/jan-disselhoff-1423a2240/Daniel Franzenhttps://github.com/da-frARC Prize: http://arcprize.org/TRANSCRIPT AND BACKGROUND READING:https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0TOC1. Solution Architecture and Strategy Overview[00:00:00] 1.1 Initial Solution Overview and Model Architecture[00:04:25] 1.2 LLM Capabilities and Dataset Approach[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies[00:14:08] 1.4 Sampling Methods and Search Implementation[00:17:52] 1.5 ARC vs Language Model Context Comparison2. LLM Search and Model Implementation[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation[00:27:04] 2.2 Symmetry Augmentation and Model Architecture[00:30:11] 2.3 Model Intelligence Characteristics and Performance[00:37:23] 2.4 Tokenization and Numerical Processing Challenges3. Advanced Training and Optimization[00:45:15] 3.1 DFS Token Selection and Probability Thresholds[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention[00:56:10] 3.4 Training Infrastructure and Optimization Experiments[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution PatternsREFS[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmannhttps://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchellhttps://arxiv.org/html/2411.14215[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodelhttps://github.com/michaelhodel/re-arc[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.https://arxiv.org/html/2408.00724v2[00:16:55] Language model reachability space exploration, University of Torontohttps://www.youtube.com/watch?v=Bpgloy1dDn0[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatthttps://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt[00:41:20] GPT tokenization approach for numbers, OpenAIhttps://platform.openai.com/docs/guides/text-generation/tokenizer-examples[00:46:25] DFS in AI search strategies, Russell & Norvighttps://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.https://www.pnas.org/doi/10.1073/pnas.1611835114[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.https://arxiv.org/abs/2106.09685[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIAhttps://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/[01:04:55] Original MCTS in computer Go, Yifan Jinhttps://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf
Vinu Sankar Sadasivan is a CS PhD ... Currently, I am working as a full-time Student Researcher at Google DeepMind on jailbreaking multimodal AI models. Robustness, Detectability, and Data Privacy in AI // MLOps Podcast #289 with Vinu Sankar Sadasivan, Student Researcher at Google DeepMind. // Abstract Recent rapid advancements in Artificial Intelligence (AI) have made it widely applicable across various domains, from autonomous systems to multimodal content generation. However, these models remain susceptible to significant security and safety vulnerabilities. Such weaknesses can enable attackers to jailbreak systems, allowing them to perform harmful tasks or leak sensitive information. As AI becomes increasingly integrated into critical applications like autonomous robotics and healthcare, the importance of ensuring AI safety is growing. Understanding the vulnerabilities in today's AI systems is crucial to addressing these concerns. // Bio Vinu Sankar Sadasivan is a final-year Computer Science PhD candidate at The University of Maryland, College Park, advised by Prof. Soheil Feizi. His research focuses on Security and Privacy in AI, with a particular emphasis on AI robustness, detectability, and user privacy. Currently, Vinu is a full-time Student Researcher at Google DeepMind, working on jailbreaking multimodal AI models. Previously, Vinu was a Research Scientist intern at Meta FAIR in Paris, where he worked on AI watermarking. Vinu is a recipient of the 2023 Kulkarni Fellowship and has earned several distinctions, including the prestigious Director's Silver Medal. He completed a Bachelor's degree in Computer Science & Engineering at IIT Gandhinagar in 2020. Prior to their PhD, Vinu gained research experience as a Junior Research Fellow in the Data Science Lab at IIT Gandhinagar and through internships at Caltech, Microsoft Research India, and IISc. // MLOps Swag/Merch https://shop.mlops.community/ // Related Links Website: https://vinusankars.github.io/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Richard on LinkedIn: https://www.linkedin.com/in/vinusankars/
When Robert Bryce comes into the house look out as he always has solid takes on anything energy. From transmission lines, to generating energy, to the impact on economies Robert Bryce is the total MAN! Think 'Energy Humanism' not 'Climate Catastrophe'. Will "drill baby drill" make a difference in today's life? We shall see......For more information on past guests, Tom's published books, and how to get in touch, please visit our newly updated website at https://www.tommattshow.com#RFZ #radio #broadcasting #podcast #lansing #michiganradio #mab #mentalhealth #puremichigan #positivemessagingThank you to Brock Fletcher and the selling team of Keller Williams Realty for their continued support of our programming!'The Tom Matt Show' Heard on-The Michigan Talk NetworkWKAR Michigan State Universities AM 870 & 102.3 FMWJIM-AM 1240 LansingWYPV FM 94.5 Mackinac Citywww.tommattshow.com(podcasts)iTunesFor more information on past guests, Tom's published books, and how to get in touch, please visit our newly updated website at https://www.tommattshow.com#RFZ #radio #broadcasting #podcast #michiganradio #lansing #michiganradio #mab #refirementzone #successstory #humaninterestpodcast #selfhelppodcast
Analysis of image classifiers demonstrates that it is possible to understand backprop networks at the task-relevant run-time algorithmic level. In these systems, at least, networks gain their power from deploying massive parallelism to check for the presence of a vast number of simple, shallow patterns. https://betterwithout.ai/images-surface-features This episode has a lot of links: David Chapman's earliest public mention, in February 2016, of image classifiers probably using color and texture in ways that "cheat": twitter.com/Meaningness/status/698688687341572096 Jordana Cepelewicz's “Where we see shapes, AI sees textures,” Quanta Magazine, July 1, 2019: https://www.quantamagazine.org/where-we-see-shapes-ai-sees-textures-20190701/ “Suddenly, a leopard print sofa appears”, May 2015: https://web.archive.org/web/20150622084852/http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html “Understanding How Image Quality Affects Deep Neural Networks” April 2016: https://arxiv.org/abs/1604.04004 Goodfellow et al., “Explaining and Harnessing Adversarial Examples,” December 2014: https://arxiv.org/abs/1412.6572 “Universal adversarial perturbations,” October 2016: https://arxiv.org/pdf/1610.08401v1.pdf “Exploring the Landscape of Spatial Robustness,” December 2017: https://arxiv.org/abs/1712.02779 “Overinterpretation reveals image classification model pathologies,” NeurIPS 2021: https://proceedings.neurips.cc/paper/2021/file/8217bb4e7fa0541e0f5e04fea764ab91-Paper.pdf “Approximating CNNs with Bag-of-Local-Features Models Works Surprisingly Well on ImageNet,” ICLR 2019: https://openreview.net/forum?id=SkfMWhAqYQ Baker et al.'s “Deep convolutional networks do not classify based on global object shape,” PLOS Computational Biology, 2018: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006613 François Chollet's Twitter threads about AI producing images of horses with extra legs: twitter.com/fchollet/status/1573836241875120128 and twitter.com/fchollet/status/1573843774803161090 “Zoom In: An Introduction to Circuits,” 2020: https://distill.pub/2020/circuits/zoom-in/ Geirhos et al., “ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness,” ICLR 2019: https://openreview.net/forum?id=Bygh9j09KX Dehghani et al., “Scaling Vision Transformers to 22 Billion Parameters,” 2023: https://arxiv.org/abs/2302.05442 Hasson et al., “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks,” February 2020: https://www.gwern.net/docs/ai/scaling/2020-hasson.pdf
During the 19th SIMM-podcast episode we assist to a conversation between biologist Olivier Hamant (Institut Michel Serre) and musician Jean-Luc Plouvier (Ictus Ensemble) on ideas and experiences they share with each other from their work on the natural world and the world of music-making. SIMM-founder Lukas Pairon interviews them. Referenced during this podcast-episode: Philippe Boesmans, John Cage, "Un homme ça s'empêche" (Albert Camus), eco-anxiety, free jazz, Philip Glass, Glenn Gould, Goodhart's law, Olivier Hamant's 'Antidote to the cult of performance', Olivier Hamant's 'De l'incohérence - philosophie politique de la robustesse', Olivier Hamant's 'La troisième voix du vivant', Ictus Ensemble, Steve Reich's 'Music for 18 Musicians', Ircam, robustness, serendipity, Michel Serre Institut, Reich's psychoacoustic by-products of repetition and phase-shifting, Simon Sinek's conference on 'The Infinite Game', systems thinking, Gertrude Stein, stochastic processes in biologyThe transcription of this episode can be found here.And during this episode music is shortly heard from Steve Reich's 'Music for 18 Musicians' (played by the Steve Reich Ensemble), as well as rehearsal recordings of the Kinshasa based traditional drummers ensemble Beta Mbonda.
Modern-day supply chains are obsessed with the goal of becoming resilient. But is that enough?
Today, Dr. Patrick Schloss, Professor in the Department of Microbiology and Immunology in the School of Medicine at the University of Michigan, joins the #QualityQuorum to discuss how the human microbiome is studied, possible pitfalls in such data analysis, and what tools he and his coworkers have developed to lead toward repeatable, hypothesis-driven science. Host: Mark O. Martin Guest: Patrick Schloss Subscribe: Apple Podcasts, Spotify Become a patron of Matters Microbial! Links for this episode An overview of how the gut microbiome is analyzed. One of the articles discussed by Dr. Schloss exploring reproducibility in microbiome studies: “Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research.” Another article discussed by Dr. Schloss, regarding the link between the microbiome and obesity: “Looking for a Signal in the Noise: Revisiting Obesity and the Microbiome.” An article from Dr. Schloss' research team that explores a link between the human microbiome and a type of colorectal cancer. A link to the MOTHUR project, used to analyze microbiome data. A link to a video by Dr. Schloss: “Understanding Disease Through the Lens of the Microbiome.” Dr. Schloss' YouTube channel about data analysis. Dr. Schloss' research group website. Dr. Schloss' faculty website. Intro music is by Reber Clark Send your questions and comments to mattersmicrobial@gmail.com
Where are your delicate and vulnerable spots that, when touched, cause a defensive reaction? In this thought-provoking episode, Ali explores the concept of fragility with Kelly Wendorf, founder of Equus and author of "Flying Lead Change." Kelly, a horse-wise woman and insightful author, unpacks how our most vulnerable spots trigger defensive reactions in both personal and professional settings. She illuminates the origins of our fragile responses, examining how shame triggers can impact our leadership and decision-making, and contrasts fragility's "power-over" mindset with the transformative potential of robustness. The dialogue moves beyond identifying fragile behaviors to examine practical approaches for cultivating robustness. Kelly shares insights on developing a more open, accountable, and genuinely connected way of being in the world. Drawing from her extensive experience with equine wisdom, she offers unique perspectives on authenticity and wholeness, challenging conventional notions of power and control in both personal growth and leadership contexts. Leave us a review on Apple Podcasts! Follow our step by step guides: - How To: Leave a Review on Your Computer: - How To: Leave a Review on Your iPhone: Never miss an episode! Sign up for our newsletter to stay up to date on all our episode releases. Fragility Feedback Conflict Radical Self-inquiry
In this episode of Learning from Machine Learning, we explore the intersection of pure mathematics and modern data science with Leland McInnes, the mind behind an ecosystem of tools for unsupervised learning including UMAP, HDBSCAN, PyNN Descent and DataMapPlot. As a researcher at the Tutte Institute for Mathematics and Computing, McInnes has fundamentally shaped how we approach and understand complex data.McInnes views data through a unique geometric lens, drawing from his background in algebraic topology to uncover hidden patterns and relationships within complex datasets. This perspective led to the creation of UMAP, a breakthrough in dimensionality reduction that preserves both local and global data structure to allow for incredible visualizations and clustering. Similarly, his clustering algorithm HDBSCAN tackles the messy reality of real-world data, handling varying densities and noise with remarkable effectiveness.But perhaps what's most striking about McInnes isn't just his technical achievements – it's his philosophy toward algorithm development. He champions the concept of "decomposing black box algorithms," advocating for transparency and understanding over blind implementation. By breaking down complex algorithms into their fundamental components, McInnes argues, we gain the power to adapt and innovate rather than simply consume.For those entering the field, McInnes offers poignant advice: resist the urge to chase the hype. Instead, find your unique angle, even if it seems unconventional. His own journey – applying concepts from algebraic topology and fuzzy simplicial sets to data science – demonstrates how breakthrough innovations often emerge from unexpected connections.Throughout our conversation, McInnes's passion for knowledge and commitment to understanding shine through. His approach reminds us that the most powerful advances in data science often come not from following the crowd, but from diving deep into fundamentals and drawing connections across disciplines.There's immense value in understanding the tools you use, questioning established approaches, and bringing your unique perspective to the field. As McInnes shows us, sometimes the most significant breakthroughs come from seeing familiar problems through a new lens.Resources for Leland McInnesLeland's GithubUMAPHDBSCANPyNN DescentDataMapPlotEVoCReferencesMaarten GrootendorstLearning from Machine Learning Episode 1Vincent Warmerdam - CalmcodeLearning from Machine Learning Episode 2Matt RocklinEmily Riehl - Category Theory in ContextLorena BarbaDavid Spivak - Fuzzy Simplicial SetsImproving Mapper's Robustness by Varying Resolution According to Lens-Space DensityLearning from Machine LearningYoutubehttps://mindfulmachines.substack.com/
Join Nathan for an expansive conversation with Dan Hendrycks, Executive Director of the Center for AI Safety and Advisor to Elon Musk's XAI. In this episode of The Cognitive Revolution, we explore Dan's groundbreaking work in AI safety and alignment, from his early contributions to activation functions to his recent projects on AI robustness and governance. Discover insights on representation engineering, circuit breakers, and tamper-resistant training, as well as Dan's perspectives on AI's impact on society and the future of intelligence. Don't miss this in-depth discussion with one of the most influential figures in AI research and safety. Check out some of Dan's research papers: MMLU: https://arxiv.org/abs/2009.03300 GELU: https://arxiv.org/abs/1606.08415 Machiavelli Benchmark: https://arxiv.org/abs/2304.03279 Circuit Breakers: https://arxiv.org/abs/2406.04313 Tamper Resistant Safeguards: https://arxiv.org/abs/2408.00761 Statement on AI Risk: https://www.safe.ai/work/statement-on-ai-risk Apply to join over 400 Founders and Execs in the Turpentine Network: https://www.turpentinenetwork.co/ SPONSORS: Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive. LMNT: LMNT is a zero-sugar electrolyte drink mix that's redefining hydration and performance. Ideal for those who fast or anyone looking to optimize their electrolyte intake. Support the show and get a free sample pack with any purchase at https://drinklmnt.com/tcr. Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution Oracle: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive CHAPTERS: (00:00:00) Teaser (00:00:48) About the Show (00:02:17) About the Episode (00:05:41) Intro (00:07:19) GELU Activation Function (00:10:48) Signal Filtering (00:12:46) Scaling Maximalism (00:18:35) Sponsors: Shopify | LMNT (00:22:03) New Architectures (00:25:41) AI as Complex System (00:32:35) The Machiavelli Benchmark (00:34:10) Sponsors: Notion | Oracle (00:37:20) Understanding MMLU Scores (00:45:23) Reasoning in Language Models (00:49:18) Multimodal Reasoning (00:54:53) World Modeling and Sora (00:57:07) Arc Benchmark and Hypothesis (01:01:06) Humanity's Last Exam (01:08:46) Benchmarks and AI Ethics (01:13:28) Robustness and Jailbreaking (01:18:36) Representation Engineering (01:30:08) Convergence of Approaches (01:34:18) Circuit Breakers (01:37:52) Tamper Resistance (01:49:10) Interpretability vs. Robustness (01:53:53) Open Source and AI Safety (01:58:16) Computational Irreducibility (02:06:28) Neglected Approaches (02:12:47) Truth Maxing and XAI (02:19:59) AI-Powered Forecasting (02:24:53) Chip Bans and Geopolitics (02:33:30) Working at CAIS (02:35:03) Extinction Risk Statement (02:37:24) Outro
Welcome to another episode of Category Visionaries — the show that explores GTM stories from tech's most innovative B2B founders. In today's episode, we're speaking with Saman Farid, CEO & Founder of Formic, a robotics platform that has raised over $60 Million in funding. Here are the most interesting points from our conversation: On-demand robot workforce: Formic provides factories with robots that can work on an hourly basis, aiming to solve labor shortages by making robotic solutions more accessible. COVID and labor shortages: Saman founded Formic during the pandemic, driven by supply chain disruptions and the unfilled jobs crisis in manufacturing, which showed a need for better automation solutions. From VC to founder: Saman transitioned back to being a founder because he believed in the massive potential of robotics. He saw a $10 billion opportunity in automating manufacturing tasks and felt compelled to take the leap. Robustness over cutting-edge tech: Formic focuses on reliability, prioritizing proven, dependable robotics technology that can run two shifts daily without downtime, over pursuing the latest but less mature advancements. Unique business model - "Pay for Productivity": Formic guarantees performance and productivity, only charging clients for the output they get, which has been a game-changer in driving faster adoption of robotics. Scaling with $250M in financing: Formic uses a sophisticated capital stack to finance deployments, which allows them to offer a "try before you buy" model, making robotics more affordable and less risky for manufacturers. // Sponsors: Front Lines — We help B2B tech companies launch, manage, and grow podcasts that drive demand, awareness, and thought leadership. www.FrontLines.io The Global Talent Co. — We help tech startups find, vet, hire, pay, and retain amazing marketing talent that costs 50-70% less than the US & Europe. www.GlobalTalent.co
In this episode, Kevin Werbach is joined by Reggie Townsend, VP of Data Ethics at SAS, an analytics software for business platform. Together they discuss SAS's nearly 50-year long history of supporting business's technology and the recent implementation of responsible AI initiatives. Reggie introduces model cards and the importance of variety in AI systems across diverse stakeholders and sectors. Reggie and Kevin explore the increase in both consumer trust and purchases when they feel a brand is ethical in its use of AI and the importance of trustworthy AI in employee retention and recruitment. Their discussion approaches the idea of bias in an untraditional way, highlighting the positive humanistic nature of bias and learning to manage the negative implications. Finally, Reggie shares his insights on fostering ethical AI practices through literacy and open dialogue, stressing the importance of authentic commitment and collaboration among developers, deployers, and regulators. SAS adds to its trustworthy AI offerings with model cards and AI governance services Article by Reggie Townsend: Talking AI in Washington, DC Reggie Townsend oversees the Data Ethics Practice (DEP) at SAS Institute. He leads the global effort for consistency and coordination of strategies that empower employees and customers to deploy data driven systems that promote human well-being, agency and equity. He has over 20 years of experience in strategic planning, management, and consulting focusing on topics such as advanced analytics, cloud computing and artificial intelligence. With visibility across multiple industries and sectors where the use of AI is growing, he combines this extensive business and technology expertise with a passion for equity and human empowerment. Want to learn more? Engage live with Professor Werbach and other Wharton faculty experts in Wharton's new Strategies for Accountable AI online executive education program. It's perfect for managers, entrepreneurs, and advisors looking to harness AI's power while addressing its risks.
On a special edition of the BioCentury This Week podcast, BioCentury's editors deliver their takeaways from the debut Grand Rounds conference, which focused on whether biotech can write a more successful playbook for translating from target to product. Weaving together takeaways from the panels, fireside chat and keynote at the conference, the editors assess the tensions between generalizability and fit-for-purpose models, between having control and capturing complexity, and, in human data, between scale and robustness/reliability, particularly for longitudinal readouts. The editors also discuss BioCentury's Q&A with USC Keck School's Patrick Lyden, who explained how high-quality, reproducible preclinical science can be feasible.View full story: https://www.biocentury.com/article/65363100:00 - Introduction02:56 - Generalizability vs. Fit for Purpose10:03 - Control vs. Complexity17:57 - Scale vs. Robustness in Human Data26:25 - Hypothesis-Driven vs. Unbiased Research28:50 - Grand Rounds 2025To submit a question to BioCentury's editors, email the BioCentury This Week team at podcasts@biocentury.com.Reach us by sending a text
Episode 305 is with Lead YDP Physiotherapist at AFC Bournemouth Aaron Hull We discussed: ▫️Moving from Australia ▫️Future of Physical Performance ▫️Studying S&C as a physio ▫️Running a successful MDT & much more! You can connect with Aaron on LinkedIn or via our Online Community If you enjoy this episode make sure to check out the previous episodes below: Brett Bartholomew - https://youtu.be/W95WIZXl5u0 Gareth Sandford & Damien Harper - https://youtu.be/BQUYkihCeD8 Stu McMillan - https://youtu.be/ya5b3TCm9Ws Keep up to date with the amazing work our sponsors are doing here: Good Prep - https://thegoodprep.com Discover the power of nutrition at WWW.THEGOODPREP.COM and use code FFF15 for 15% off your first order Rezzil - rezzil.com Hytro - hytro.com Maximise your athletic potential with Hytro BFR. Easier, safer and more practical BFR for squads to prepare for and recover from exercise than ever before. Click the link [[ https://bit.ly/3ILVsbU ]] to speak to our Pro Sports team about how to get Hytro BFR at your club. Join our online community & get access to the very best Football Fitness content as well as the ability to connect with Sport Scientists and Strength & Conditioning coaches from around the world. To get FULL access to all of these & even more like this, sign up to a FREE month on our online community at the link below. www.footballfitfed.com/forum/index.aspx Keep up to date with everything that is going on at Football Fitness Federation at the following links: Twitter - @FootballFitFed Instagram - @FootballFitFed Website - www.footballfitfed.com
Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api Andrew's site: https://andrewilyas.com/ https://x.com/andrew_ilyas TOC: 00:00:00 - Introduction and Andrew's background 00:03:52 - Overview of the machine learning pipeline 00:06:31 - Data modeling paper discussion 00:26:28 - TRAK: Evolution of data modeling work 00:43:58 - Discussion on abstraction, reasoning, and neural networks 00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper 01:03:24 - Types of features learned by neural networks 01:10:51 - Black box attacks paper 01:15:39 - Work on data collection and bias 01:25:48 - Future research plans and closing thoughts References: Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 TRAK: Attributing Model Behavior at Scale https://arxiv.org/pdf/2303.14186 Datamodels: Predicting Predictions from Training Data https://arxiv.org/pdf/2202.00622 Adversarial Examples Are Not Bugs, They Are Features https://arxiv.org/pdf/1905.02175 IMAGENET-TRAINED CNNS https://arxiv.org/pdf/1811.12231 ZOO: Zeroth Order Optimization Based Black-box https://arxiv.org/pdf/1708.03999 A Spline Theory of Deep Networks https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf Scaling Monosemanticity https://transformer-circuits.pub/2024/scaling-monosemanticity/ Adversarial Examples Are Not Bugs, They Are Features https://gradientscience.org/adv/ Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies https://proceedings.mlr.press/v235/bartoldson24a.html Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors https://arxiv.org/abs/1807.07978 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks https://arxiv.org/abs/2005.11295 Estimation of Standard Auction Models https://arxiv.org/abs/2205.02060 What Makes A Good Fisherman? Linear Regression under Self-Selection Bias https://arxiv.org/abs/2205.03246 Towards Tracing Factual Knowledge in Language Models Back to the Training Data [Akyürek] https://arxiv.org/pdf/2205.11482
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed, published by johnswentworth on August 22, 2024 on The AI Alignment Forum. This post walks through the math for a theorem. It's intended to be a reference post, which we'll link back to as-needed from future posts. The question which first motivated this theorem for us was: "Redness of a marker seems like maybe a natural latent over a bunch of parts of the marker, and redness of a car seems like maybe a natural latent over a bunch of parts of the car, but what makes redness of the marker 'the same as' redness of the car? How are they both instances of one natural thing, i.e. redness? (or 'color'?)". But we're not going to explain in this post how the math might connect to that use-case; this post is just the math. Suppose we have multiple distributions P1,…,Pk over the same random variables X1,…,Xn. (Speaking somewhat more precisely: the distributions are over the same set, and an element of that set is represented by values (x1,…,xn).) We take a mixture of the distributions: P[X]:=jαjPj[X], where jαj=1 and α is nonnegative. Then our theorem says: if an approximate natural latent exists over P[X], and that latent is robustly natural under changing the mixture weights α, then the same latent is approximately natural over Pj[X] for all j. Mathematically: the natural latent over P[X] is defined by (x,λP[Λ=λ|X=x]), and naturality means that the distribution (x,λP[Λ=λ|X=x]P[X=x]) satisfies the naturality conditions (mediation and redundancy).The theorem says that, if the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions robustly with respect to changes in α, then (x,λP[Λ=λ|X=x]Pj[X=x]) satisfies the naturality conditions for all j. "Robustness" here can be interpreted in multiple ways - we'll cover two here, one for which the theorem is trivial and another more substantive, but we expect there are probably more notions of "robustness" which also make the theorem work. Trivial Version First notion of robustness: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ for all values of α (subject to jαj=1 and α nonnegative). Then: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ specifically for αj=δjk, i.e. α which is 0 in all entries except a 1 in entry k. In that case, the joint distribution is (x,λP[Λ=λ|X=x]Pk[X=x]), therefore Λ is natural over Pk. Invoke for each k, and the theorem is proven. ... but that's just abusing an overly-strong notion of robustness. Let's do a more interesting one. Nontrivial Version Second notion of robustness: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ, and the gradient of the approximation error with respect to (allowed) changes in α is (locally) zero. We need to prove that the joint distributions (x,λP[Λ=λ|X=x]Pj[X=x]) satisfy both the mediation and redundancy conditions for each j. We'll start with redundancy, because it's simpler. Redundancy We can express the approximation error of the redundancy condition with respect to Xi under the mixed distribution as DKL(P[Λ,X]||P[X]P[Λ|Xi])=EX[DKL(P[Λ|X]||P[Λ|Xi])] where, recall, P[Λ,X]:=P[Λ|X]jαjPj[X]. We can rewrite that approximation error as: EX[DKL(P[Λ|X]||P[Λ|Xi])] =jαjPj[X]DKL(P[Λ|X]||P[Λ|Xi]) =jαjEjX[DKL(P[Λ|X]||P[Λ|Xi])] Note that Pj[Λ|X]=P[Λ|X] is the same under all the distributions (by definition), so: =jαjDKL(Pj[Λ,X]||P[Λ|Xi]) and by factorization transfer: jαjDKL(Pj[Λ,X]||Pj[Λ|Xi]) In other words: if ϵji is the redundancy error with respect to Xi under distribution j, and ϵi is the redundancy error with respect to Xi under the mixed distribution P, then ϵijαjϵji The redundancy error of the mixed distribution is a...
Can your AI models survive a big disaster? While a recent major IT incident with CrowdStrike wasn't AI related, the magnitude and reaction reminded us that no system no matter how proven is immune to failure. AI modeling systems are no different. Neglecting the best practices of building models can lead to unrecoverable failures. Discover how the three-tiered framework of robustness, resiliency, and anti-fragility can guide your approach to creating AI infrastructures that not only perform reliably under stress but also fail gracefully when the unexpected happens.Show NotesTechnology, incidents, and why basics matter (00:00:03)While the recent Crowdstrike incident wasn't caused by AI, it's impact was a wakeup call for people and processes that support critical systemsAs AI is increasingly being used at both experimental and production levels, we can expect AI incidents are a matter of if, not when. What can you do to prepare?The "7P's": Are you capable of handling the unexpected? (00:09:05)The 7Ps is an adage, dating back to WWII, that aligns with our "do things the hard way" approach to AI governance and modeling systems.Let's consider the levels of building a performant system: Robustness, Resiliency, and AntifragilityModel robustness (00:10:03)Robustness is a very important but often overlooked component of building modeling systems. We suspect that part of the problem is due to: The Kaggle-driven upbringing of data scientistsAssumed generalizability of modeling systems, when models are optimized to perform well on their training data but do not generalize enough to perform well on unseen data.Model resilience (00:16:10)Resiliency is the ability to absorb adverse stimuli without destruction and return to its pre-event state.In practice, robustness and resiliency, testing, and planning are often easy components to leave out. This is where risks and threats are exposed.See also, Episode 8. Model validation: Robustness and resilienceModels and antifragility (00:25:04)Unlike resiliency, which is the ability to absorb damaging inputs without breaking, antifragility is the ability of a system to improve from challenging stimuli. (i.e. the human body)A key question we need to ask ourselves if we are not actively building our AI systems to be antifragile, why are we using AI systems at all?What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fields that I reference when thinking about AI takeover prevention, published by Buck Shlegeris on August 13, 2024 on The AI Alignment Forum. Is AI takeover like a nuclear meltdown? A coup? A plane crash? My day job is thinking about safety measures that aim to reduce catastrophic risks from AI (especially risks from egregious misalignment). The two main themes of this work are the design of such measures (what's the space of techniques we might expect to be affordable and effective) and their evaluation (how do we decide which safety measures to implement, and whether a set of measures is sufficiently robust). I focus especially on AI control, where we assume our models are trying to subvert our safety measures and aspire to find measures that are robust anyway. Like other AI safety researchers, I often draw inspiration from other fields that contain potential analogies. Here are some of those fields, my opinions on their strengths and weaknesses as analogies, and some of my favorite resources on them. Robustness to insider threats The single field that seems most closely analogous to AI control is reducing vulnerability to threats resulting from insiders, especially threats that arise from access to IT systems. That is, you have a bunch of employees who need to have access to various IT systems in order to do their jobs, and you're worried that some of these employees will abuse their access in order to accomplish some nefarious purpose. In a bank, you might worry about your tellers withdrawing money from customer accounts inappropriately. In a software company, you might worry about your employees sabotaging your code out of spite, violating user privacy for fun, or stealing your data to give it to a competitor. This is quite similar to the problem faced by AI developers when they're trying to control AIs (especially in internal deployments, which I argue is important here): All of the access the AI has is access the developer decided to give it; the developer faces a constant tradeoff between the productivity of the AI (which is maximized by being permissive) and the danger it poses (which is minimized by being restrictive). And obviously, because it's an AI it only has access to resources that are connected to the computer network where it's running, so we can restrict our attention to the IT resources. It's easy for the developer to constantly surveil the AI, which substantially reduces the space of strategies the AI can take. The developer has a lot of control over workflows, and is able to choose them to make it harder for the AI to subtly cause problems. For example, they can restrict affordances in ways that make it harder for AIs to cause problems without being noticed. An example of a similar technique in the insider threat prevention case is the use of administrative APIs instead of direct access to a system, as described in "Building Secure and Reliable Systems" chapter 5. That is: instead of having your engineers resolve issues by SSHing into production machines (which implicitly grants them very broad access that can lead to problems, intentional or accidental), you design an administrative API which allows the engineers to access just the affordances that usually suffice for them to do their job. Of course, they'll sometimes need to take actions that aren't available in the administrative API; this is called a breakglass mechanism. But because breakglass actions are rare, it's affordable to apply more scrutiny to them: you might use multi-party authorization, requiring another engineer to sign off on the commands before they take effect, or you might do auditing (e.g. apparently some SRE teams review all the breakglass actions at a weekly team meeting, which allows them to spot irregularities and discourage gratuitous use of the mechanism). Similar t...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does robustness improve with scale?, published by ChengCheng on July 25, 2024 on The AI Alignment Forum. Adversarial vulnerabilities have long been an issue in various ML systems. Large language models (LLMs) are no exception, suffering from issues such as jailbreaks: adversarial prompts that bypass model safeguards. At the same time, scale has led to remarkable advances in the capabilities of LLMs, leading us to ask: to what extent can scale help solve robustness? In this post, we explore this question in the classification setting: predicting the binary label of a text input. We find that scale alone does little to improve model robustness, but that larger models benefit more from defenses such as adversarial training than do smaller models. We study models in the classification setting as there is a clear notion of "correct behavior": does the model output the right label? We can then naturally define robustness as the proportion of the attacked dataset that the model correctly classifies. We evaluate models on tasks such as spam detection and movie sentiment classification. We adapt pretrained foundation models for classification by replacing the generative model's unembedding layer with a randomly initialized classification head, and then fine-tune the models on each task. We focus on adversarial-suffix style attacks: appending an adversarially chosen prompt to a benign prompt in an attempt to cause the model to misclassify the input, e.g., classify a spam email as not-spam. We consider two attacks: the state-of-the-art Greedy Coordinate Gradient method (Zou et al., 2023), and a baseline random token attack. This simple threat model has the advantage of being unlikely to change the semantics of the input. For example, a spam email is still spam even if a handful of tokens are appended to it. Of course, attackers are not limited to such a simple threat model: studying more open-ended threat models (such as rephrasing the prompt, or replacing words with synonyms) and corresponding attack methods (such as LLM generated adversarial prompts) is an important direction that we hope to pursue soon in future work. For more information, see our blog post or paper. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Guest: Campbell Watson, Senior Research Scientist at IBM ResearchAs artificial intelligence, or AI, continues to become more pervasive in our technology, it's only natural to wonder what it means for meteorology and climatology. Believe it or not, AI is already revolutionizing how we develop models in the Earth and Space sciences. Joining us today is Campbell Watson, a Senior Research Scientist at IBM Research, to discuss how we are creating these AI models, and the opportunities and advancements we hope to learn from using them.Chapters00:00 Introduction to AI in Meteorology and Climatology01:35 Campbell Watson's Background and Interest in Weather04:18 The Use of AI in Weather Forecasting05:24 The Evolution of AI in Weather Forecasting06:26 Understanding AI at its Core08:09 Difference Between AI and Conventional Weather Model Forecasting10:44 The Difference Between AI and Conventional Weather Model Forecasting13:31 AI Models Learning from Weather Models20:30 Foundation Models and Unsupervised Learning25:12 Verification of AI in Weather Forecasting27:57 The Robustness of AI Models28:24 Learning from Observations29:23 AI Models Learning from Observations30:09 The Importance of Observations30:38 Comparing Deep Learning and AI with Traditional Weather Forecasting Models35:38 Trustworthiness in AI Forecasting36:12 Adopting AI in Weather Forecasting41:46 Nowcasting with AI44:03 AI in Climate Forecasting47:38 The Future of AI in Weather ForecastingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Beyond the Board: Exploring AI Robustness Through Go, published by AdamGleave on June 19, 2024 on The AI Alignment Forum. Last year, we showed that supposedly superhuman Go AIs can be beaten by human amateurs playing specific "cyclic" patterns on the board. Vulnerabilities have previously been observed in a wide variety of sub- or near-human AI systems, but this result demonstrates that even far superhuman AI systems can fail catastrophically in surprising ways. This lack of robustness poses a critical challenge for AI safety, especially as AI systems are integrated in critical infrastructure or deployed in large-scale applications. We seek to defend Go AIs, in the process developing insights that can make AI applications in various domains more robust against unpredictable threats. We explored three defense strategies: positional adversarial training on handpicked examples of cyclic patterns, iterated adversarial training against successively fine-tuned adversaries, and replacing convolutional neural networks with vision transformers. We found that the two adversarial training methods defend against the original cyclic attack. However, we also found several qualitatively new adversarial strategies that can overcome all these defenses. Nonetheless, finding these new attacks is more challenging than against an undefended KataGo, requiring more training compute resources for the adversary. For more information, see our blog post, project website or paper. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Our 170th episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris) Feel free to leave us feedback here. Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai Timestamps + Links: Tools & Apps(00:03:33) KLING is the latest AI video generator that could rival OpenAI's Sora (00:09:16) ‘Apple Intelligence' will automatically choose between on-device and cloud-powered AI (00:12:21) Udio introduces new udio-130 music generation model and more advanced features (00:14:38) Perplexity AI's new feature will turn your searches into shareable pages (00:16:35) ElevenLabs' AI generator makes explosions or other sound effects with just a prompt (00:18:37) Google's updated AI-powered NotebookLM expands to India, UK and over 200 other countries Applications & Business(00:19:40) OpenAI is restarting its robotics research group (00:25:01) Saudi fund invests in China effort to create rival to OpenAI (00:29:34) UAE seeks ‘marriage' with US over artificial intelligence deals (00:33:01) Zoox to test self-driving cars in Austin and Miami (00:35:49) Microsoft Lays Off 1,500 Workers, Blames "AI Wave" (00:38:28) Avengers, assemble—Google, Intel, Microsoft, AMD and more team up to develop an interconnect standard to rival Nvidia's NVLink Projects & Open Source(00:40:39) GLM-4-9B-Chat-1M (00:46:37) Hugging Face and Pollen Robotics show off first project: an open source robot that does chores (00:49:40) Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv (00:51:59) Stability AI debuts new Stable Audio Open for sound design Research & Advancements(00:54:05) Scaling and evaluating sparse autoencoders (01:04:54) Improving Alignment and Robustness with Short Circuiting (01:12:11) Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach (01:16:20) GPT-4 didn't ace the bar exam after all, MIT research suggests — it didn't even break the 70th percentile Policy & Safety(01:20:11) Former OpenAI researcher foresees AGI reality in 2027 (01:28:03) OpenAI Insiders Warn of a ‘Reckless' Race for Dominance (01:33:52) Testing and mitigating elections-related risks (01:36:26) Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Synthetic Media & Art(01:43:23) The Uncanny Rise of the World's First AI Beauty Pageant (01:46:25) Outro + AI Song
Dan is joined by Mayukh Bhattacharya, Engineering, Executive Director, at Synopsys. Mayukh has been with Synopsys since 2003. For the first 14 years, he made many technical contributions to PrimeSim XA. Currently, he leads R&D teams for PrimeSim Design Robustness and PrimeSim Custom Fault products. He was one of the early… Read More
Robustness Abstract Philip and Fred discuss the idea of a robust design for a product or system. Key Points Join Philip and Fred as they discuss Topics include: To improve the ability of a product to withstand unexpected stresses, improve the robustness. Design techniques to improve robustness. Various methods to improve reliability like stress-strength analysis, […] The post SOR 953 Robustness appeared first on Accendo Reliability.
Champions League Clash on The Overlap! We are BACK, dissecting the burning question: who will reign supreme in Europe this season? Prepare for fiery takes, tactical tussles, and maybe even a friendly wager as we reveal our top contenders. But, two teams caused a major rift (can you guess which ones?). Will logic prevail (plot twist: nope)? Tune in, grab your scarves, and join the conversation. Don't forget to share your own predictions – who are YOU backing? #UCL #Predictions #TheOverlap