Podcasts about grokking

43PODCASTS
83EPISODES
51mAVG DURATION
1MONTHLY NEW EPISODE
Feb 6, 2026LATEST

POPULARITY

20192020202120222023202420252026

Best podcasts about grokking

Harry's Pod.com

16 episodes with grokking

The Nonlinear Library

12 episodes with grokking

5 episodes with grokking

Book Overflow

3 episodes with grokking

Machine Learning Street Talk

2 episodes with grokking

BeerNet Radio

3 episodes with grokking

Tell Me Your Story

2 episodes with grokking

Kodsnack

2 episodes with grokking

The Inside View

2 episodes with grokking

Latest podcast episodes about grokking

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Feb 6, 2026 68:01

From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation.In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire's core bet: that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. Goodfire's answer is to build a bi-directional interface between humans and models: read what's happening inside, edit it surgically, and eventually use interpretability during training so customization isn't just brute-force guesswork.Mark and Myra walk through what that looks like when you stop treating interpretability like a lab demo and start treating it like infrastructure: lightweight probes that add near-zero latency, token-level safety filters that can run at inference time, and interpretability workflows that survive messy constraints (multilingual inputs, synthetic→real transfer, regulated domains, no access to sensitive data). We also get a live window into what “frontier-scale interp” means operationally (i.e. steering a trillion-parameter model in real time by targeting internal features) plus why the same tooling generalizes cleanly from language models to genomics, medical imaging, and “pixel-space” world models.We discuss:* Myra + Mark's path: Palantir (health systems, forward-deployed engineering) → Goodfire early team; Two Sigma → Head of Product, translating frontier interpretability research into a platform and real-world deployments* What “interpretability” actually means in practice: not just post-hoc poking, but a broader “science of deep learning” approach across the full AI lifecycle (data curation → post-training → internal representations → model design)* Why post-training is the first big wedge: “surgical edits” for unintended behaviors likereward hacking, sycophancy, noise learned during customization plus the dream of targeted unlearning and bias removal without wrecking capabilities* SAEs vs probes in the real world: why SAE feature spaces sometimes underperform classifiers trained on raw activations for downstream detection tasks (hallucination, harmful intent, PII), and what that implies about “clean concept spaces”* Rakuten in production: deploying interpretability-based token-level PII detection at inference time to prevent routing private data to downstream providers plus the gnarly constraints: no training on real customer PII, synthetic→real transfer, English + Japanese, and tokenization quirks* Why interp can be operationally cheaper than LLM-judge guardrails: probes are lightweight, low-latency, and don't require hosting a second large model in the loop* Real-time steering at frontier scale: a demo of steering Kimi K2 (~1T params) live and finding features via SAE pipelines, auto-labeling via LLMs, and toggling a “Gen-Z slang” feature across multiple layers without breaking tool use* Hallucinations as an internal signal: the case that models have latent uncertainty / “user-pleasing” circuitry you can detect and potentially mitigate more directly than black-box methods* Steering vs prompting: the emerging view that activation steering and in-context learning are more closely connected than people think, including work mapping between the two (even for jailbreak-style behaviors)* Interpretability for science: using the same tooling across domains (genomics, medical imaging, materials) to debug spurious correlations and extract new knowledge up to and including early biomarker discovery work with major partners* World models + “pixel-space” interpretability: why vision/video models make concepts easier to see, how that accelerates the feedback loop, and why robotics/world-model partners are especially interesting design partners* The north star: moving from “data in, weights out” to intentional model design where experts can impart goals and constraints directly, not just via reward signals and brute-force post-training—Goodfire AI* Website: https://goodfire.ai* LinkedIn: https://www.linkedin.com/company/goodfire-ai/* X: https://x.com/GoodfireAIMyra Deng* Website: https://myradeng.com/* LinkedIn: https://www.linkedin.com/in/myra-deng/* X: https://x.com/myra_dengMark Bissell* LinkedIn: https://www.linkedin.com/in/mark-bissell/* X: https://x.com/MarkMBissellFull Video EpisodeTimestamps00:00:00 Introduction00:00:05 Introduction to the Latent Space Podcast and Guests from Goodfire00:00:29 What is Goodfire? Mission and Focus on Interpretability00:01:01 Goodfire's Practical Approach to Interpretability00:01:37 Goodfire's Series B Fundraise Announcement00:02:04 Backgrounds of Mark and Myra from Goodfire00:02:51 Team Structure and Roles at Goodfire00:05:13 What is Interpretability? Definitions and Techniques00:05:30 Understanding Errors00:07:29 Post-training vs. Pre-training Interpretability Applications00:08:51 Using Interpretability to Remove Unwanted Behaviors00:10:09 Grokking, Double Descent, and Generalization in Models00:10:15 404 Not Found Explained00:12:06 Subliminal Learning and Hidden Biases in Models00:14:07 How Goodfire Chooses Research Directions and Projects00:15:00 Troubleshooting Errors00:16:04 Limitations of SAEs and Probes in Interpretability00:18:14 Rakuten Case Study: Production Deployment of Interpretability00:20:45 Conclusion00:21:12 Efficiency Benefits of Interpretability Techniques00:21:26 Live Demo: Real-Time Steering in a Trillion Parameter Model00:25:15 How Steering Features are Identified and Labeled00:26:51 Detecting and Mitigating Hallucinations Using Interpretability00:31:20 Equivalence of Activation Steering and Prompting00:34:06 Comparing Steering with Fine-Tuning and LoRA Techniques00:36:04 Model Design and the Future of Intentional AI Development00:38:09 Getting Started in Mechinterp: Resources, Programs, and Open Problems00:40:51 Industry Applications and the Rise of Mechinterp in Practice00:41:39 Interpretability for Code Models and Real-World Usage00:43:07 Making Steering Useful for More Than Stylistic Edits00:46:17 Applying Interpretability to Healthcare and Scientific Discovery00:49:15 Why Interpretability is Crucial in High-Stakes Domains like Healthcare00:52:03 Call for Design Partners Across Domains00:54:18 Interest in World Models and Visual Interpretability00:57:22 Sci-Fi Inspiration: Ted Chiang and Interpretability01:00:14 Interpretability, Safety, and Alignment Perspectives01:04:27 Weak-to-Strong Generalization and Future Alignment Challenges01:05:38 Final Thoughts and Hiring/Collaboration Opportunities at GoodfireTranscriptShawn Wang [00:00:05]: So welcome to the Latent Space pod. We're back in the studio with our special MechInterp co-host, Vibhu. Welcome. Mochi, Mochi's special co-host. And Mochi, the mechanistic interpretability doggo. We have with us Mark and Myra from Goodfire. Welcome. Thanks for having us on. Maybe we can sort of introduce Goodfire and then introduce you guys. How do you introduce Goodfire today?Myra Deng [00:00:29]: Yeah, it's a great question. So Goodfire, we like to say, is an AI research lab that focuses on using interpretability to understand, learn from, and design AI models. And we really believe that interpretability will unlock the new generation, next frontier of safe and powerful AI models. That's our description right now, and I'm excited to dive more into the work we're doing to make that happen.Shawn Wang [00:00:55]: Yeah. And there's always like the official description. Is there an understatement? Is there an unofficial one that sort of resonates more with a different audience?Mark Bissell [00:01:01]: Well, being an AI research lab that's focused on interpretability, there's obviously a lot of people have a lot that they think about when they think of interpretability. And I think we have a pretty broad definition of what that means and the types of places that can be applied. And in particular, applying it in production scenarios, in high stakes industries, and really taking it sort of from the research world into the real world. Which, you know. It's a new field, so that hasn't been done all that much. And we're excited about actually seeing that sort of put into practice.Shawn Wang [00:01:37]: Yeah, I would say it wasn't too long ago that Anthopic was like still putting out like toy models or superposition and that kind of stuff. And I wouldn't have pegged it to be this far along. When you and I talked at NeurIPS, you were talking a little bit about your production use cases and your customers. And then not to bury the lead, today we're also announcing the fundraise, your Series B. $150 million. $150 million at a 1.25B valuation. Congrats, Unicorn.Mark Bissell [00:02:02]: Thank you. Yeah, no, things move fast.Shawn Wang [00:02:04]: We were talking to you in December and already some big updates since then. Let's dive, I guess, into a bit of your backgrounds as well. Mark, you were at Palantir working on health stuff, which is really interesting because the Goodfire has some interesting like health use cases. I don't know how related they are in practice.Mark Bissell [00:02:22]: Yeah, not super related, but I don't know. It was helpful context to know what it's like. Just to work. Just to work with health systems and generally in that domain. Yeah.Shawn Wang [00:02:32]: And Mara, you were at Two Sigma, which actually I was also at Two Sigma back in the day. Wow, nice.Myra Deng [00:02:37]: Did we overlap at all?Shawn Wang [00:02:38]: No, this is when I was briefly a software engineer before I became a sort of developer relations person. And now you're head of product. What are your sort of respective roles, just to introduce people to like what all gets done in Goodfire?Mark Bissell [00:02:51]: Yeah, prior to Goodfire, I was at Palantir for about three years as a forward deployed engineer, now a hot term. Wasn't always that way. And as a technical lead on the health care team and at Goodfire, I'm a member of the technical staff. And honestly, that I think is about as specific as like as as I could describe myself because I've worked on a range of things. And, you know, it's it's a fun time to be at a team that's still reasonably small. I think when I joined one of the first like ten employees, now we're above 40, but still, it looks like there's always a mix of research and engineering and product and all of the above. That needs to get done. And I think everyone across the team is, you know, pretty, pretty switch hitter in the roles they do. So I think you've seen some of the stuff that I worked on related to image models, which was sort of like a research demo. More recently, I've been working on our scientific discovery team with some of our life sciences partners, but then also building out our core platform for more of like flexing some of the kind of MLE and developer skills as well.Shawn Wang [00:03:53]: Very generalist. And you also had like a very like a founding engineer type role.Myra Deng [00:03:58]: Yeah, yeah.Shawn Wang [00:03:59]: So I also started as I still am a member of technical staff, did a wide range of things from the very beginning, including like finding our office space and all of this, which is we both we both visited when you had that open house thing. It was really nice.Myra Deng [00:04:13]: Thank you. Thank you. Yeah. Plug to come visit our office.Shawn Wang [00:04:15]: It looked like it was like 200 people. It has room for 200 people. But you guys are like 10.Myra Deng [00:04:22]: For a while, it was very empty. But yeah, like like Mark, I spend. A lot of my time as as head of product, I think product is a bit of a weird role these days, but a lot of it is thinking about how do we take our frontier research and really apply it to the most important real world problems and how does that then translate into a platform that's repeatable or a product and working across, you know, the engineering and research teams to make that happen and also communicating to the world? Like, what is interpretability? What is it used for? What is it good for? Why is it so important? All of these things are part of my day-to-day as well.Shawn Wang [00:05:01]: I love like what is things because that's a very crisp like starting point for people like coming to a field. They all do a fun thing. Vibhu, why don't you want to try tackling what is interpretability and then they can correct us.Vibhu Sapra [00:05:13]: Okay, great. So I think like one, just to kick off, it's a very interesting role to be head of product, right? Because you guys, at least as a lab, you're more of an applied interp lab, right? Which is pretty different than just normal interp, like a lot of background research. But yeah. You guys actually ship an API to try these things. You have Ember, you have products around it, which not many do. Okay. What is interp? So basically you're trying to have an understanding of what's going on in model, like in the model, in the internal. So different approaches to do that. You can do probing, SAEs, transcoders, all this stuff. But basically you have an, you have a hypothesis. You have something that you want to learn about what's happening in a model internals. And then you're trying to solve that from there. You can do stuff like you can, you know, you can do activation mapping. You can try to do steering. There's a lot of stuff that you can do, but the key question is, you know, from input to output, we want to have a better understanding of what's happening and, you know, how can we, how can we adjust what's happening on the model internals? How'd I do?Mark Bissell [00:06:12]: That was really good. I think that was great. I think it's also a, it's kind of a minefield of a, if you ask 50 people who quote unquote work in interp, like what is interpretability, you'll probably get 50 different answers. And. Yeah. To some extent also like where, where good fire sits in the space. I think that we're an AI research company above all else. And interpretability is a, is a set of methods that we think are really useful and worth kind of specializing in, in order to accomplish the goals we want to accomplish. But I think we also sort of see some of the goals as even more broader as, as almost like the science of deep learning and just taking a not black box approach to kind of any part of the like AI development life cycle, whether that. That means using interp for like data curation while you're training your model or for understanding what happened during post-training or for the, you know, understanding activations and sort of internal representations, what is in there semantically. And then a lot of sort of exciting updates that were, you know, are sort of also part of the, the fundraise around bringing interpretability to training, which I don't think has been done all that much before. A lot of this stuff is sort of post-talk poking at models as opposed to. To actually using this to intentionally design them.Shawn Wang [00:07:29]: Is this post-training or pre-training or is that not a useful.Myra Deng [00:07:33]: Currently focused on post-training, but there's no reason the techniques wouldn't also work in pre-training.Shawn Wang [00:07:38]: Yeah. It seems like it would be more active, applicable post-training because basically I'm thinking like rollouts or like, you know, having different variations of a model that you can tweak with the, with your steering. Yeah.Myra Deng [00:07:50]: And I think in a lot of the news that you've seen in, in, on like Twitter or whatever, you've seen a lot of unintended. Side effects come out of post-training processes, you know, overly sycophantic models or models that exhibit strange reward hacking behavior. I think these are like extreme examples. There's also, you know, very, uh, mundane, more mundane, like enterprise use cases where, you know, they try to customize or post-train a model to do something and it learns some noise or it doesn't appropriately learn the target task. And a big question that we've always had is like, how do you use your understanding of what the model knows and what it's doing to actually guide the learning process?Shawn Wang [00:08:26]: Yeah, I mean, uh, you know, just to anchor this for people, uh, one of the biggest controversies of last year was 4.0 GlazeGate. I've never heard of GlazeGate. I didn't know that was what it was called. The other one, they called it that on the blog post and I was like, well, how did OpenAI call it? Like officially use that term. And I'm like, that's funny, but like, yeah, I guess it's the pitch that if they had worked a good fire, they wouldn't have avoided it. Like, you know what I'm saying?Myra Deng [00:08:51]: I think so. Yeah. Yeah.Mark Bissell [00:08:53]: I think that's certainly one of the use cases. I think. Yeah. Yeah. I think the reason why post-training is a place where this makes a lot of sense is a lot of what we're talking about is surgical edits. You know, you want to be able to have expert feedback, very surgically change how your model is doing, whether that is, you know, removing a certain behavior that it has. So, you know, one of the things that we've been looking at or is, is another like common area where you would want to make a somewhat surgical edit is some of the models that have say political bias. Like you look at Quen or, um, R1 and they have sort of like this CCP bias.Shawn Wang [00:09:27]: Is there a CCP vector?Mark Bissell [00:09:29]: Well, there's, there are certainly internal, yeah. Parts of the representation space where you can sort of see where that lives. Yeah. Um, and you want to kind of, you know, extract that piece out.Shawn Wang [00:09:40]: Well, I always say, you know, whenever you find a vector, a fun exercise is just like, make it very negative to see what the opposite of CCP is.Mark Bissell [00:09:47]: The super America, bald eagles flying everywhere. But yeah. So in general, like lots of post-training tasks where you'd want to be able to, to do that. Whether it's unlearning a certain behavior or, you know, some of the other kind of cases where this comes up is, are you familiar with like the, the grokking behavior? I mean, I know the machine learning term of grokking.Shawn Wang [00:10:09]: Yeah.Mark Bissell [00:10:09]: Sort of this like double descent idea of, of having a model that is able to learn a generalizing, a generalizing solution, as opposed to even if memorization of some task would suffice, you want it to learn the more general way of doing a thing. And so, you know, another. A way that you can think about having surgical access to a model's internals would be learn from this data, but learn in the right way. If there are many possible, you know, ways to, to do that. Can make interp solve the double descent problem?Shawn Wang [00:10:41]: Depends, I guess, on how you. Okay. So I, I, I viewed that double descent as a problem because then you're like, well, if the loss curves level out, then you're done, but maybe you're not done. Right. Right. But like, if you actually can interpret what is a generalizing or what you're doing. What is, what is still changing, even though the loss is not changing, then maybe you, you can actually not view it as a double descent problem. And actually you're just sort of translating the space in which you view loss and like, and then you have a smooth curve. Yeah.Mark Bissell [00:11:11]: I think that's certainly like the domain of, of problems that we're, that we're looking to get.Shawn Wang [00:11:15]: Yeah. To me, like double descent is like the biggest thing to like ML research where like, if you believe in scaling, then you don't need, you need to know where to scale. And. But if you believe in double descent, then you don't, you don't believe in anything where like anything levels off, like.Vibhu Sapra [00:11:30]: I mean, also tendentially there's like, okay, when you talk about the China vector, right. There's the subliminal learning work. It was from the anthropic fellows program where basically you can have hidden biases in a model. And as you distill down or, you know, as you train on distilled data, those biases always show up, even if like you explicitly try to not train on them. So, you know, it's just like another use case of. Okay. If we can interpret what's happening in post-training, you know, can we clear some of this? Can we even determine what's there? Because yeah, it's just like some worrying research that's out there that shows, you know, we really don't know what's going on.Mark Bissell [00:12:06]: That is. Yeah. I think that's the biggest sentiment that we're sort of hoping to tackle. Nobody knows what's going on. Right. Like subliminal learning is just an insane concept when you think about it. Right. Train a model on not even the logits, literally the output text of a bunch of random numbers. And now your model loves owls. And you see behaviors like that, that are just, they defy, they defy intuition. And, and there are mathematical explanations that you can get into, but. I mean.Shawn Wang [00:12:34]: It feels so early days. Objectively, there are a sequence of numbers that are more owl-like than others. There, there should be.Mark Bissell [00:12:40]: According to, according to certain models. Right. It's interesting. I think it only applies to models that were initialized from the same starting Z. Usually, yes.Shawn Wang [00:12:49]: But I mean, I think that's a, that's a cheat code because there's not enough compute. But like if you believe in like platonic representation, like probably it will transfer across different models as well. Oh, you think so?Mark Bissell [00:13:00]: I think of it more as a statistical artifact of models initialized from the same seed sort of. There's something that is like path dependent from that seed that might cause certain overlaps in the latent space and then sort of doing this distillation. Yeah. Like it pushes it towards having certain other tendencies.Vibhu Sapra [00:13:24]: Got it. I think there's like a bunch of these open-ended questions, right? Like you can't train in new stuff during the RL phase, right? RL only reorganizes weights and you can only do stuff that's somewhat there in your base model. You're not learning new stuff. You're just reordering chains and stuff. But okay. My broader question is when you guys work at an interp lab, how do you decide what to work on and what's kind of the thought process? Right. Because we can ramble for hours. Okay. I want to know this. I want to know that. But like, how do you concretely like, you know, what's the workflow? Okay. There's like approaches towards solving a problem, right? I can try prompting. I can look at chain of thought. I can train probes, SAEs. But how do you determine, you know, like, okay, is this going anywhere? Like, do we have set stuff? Just, you know, if you can help me with all that. Yeah.Myra Deng [00:14:07]: It's a really good question. I feel like we've always at the very beginning of the company thought about like, let's go and try to learn what isn't working in machine learning today. Whether that's talking to customers or talking to researchers at other labs, trying to understand both where the frontier is going and where things are really not falling apart today. And then developing a perspective on how we can push the frontier using interpretability methods. And so, you know, even our chief scientist, Tom, spends a lot of time talking to customers and trying to understand what real world problems are and then taking that back and trying to apply the current state of the art to those problems and then seeing where they fall down basically. And then using those failures or those shortcomings to understand what hills to climb when it comes to interpretability research. So like on the fundamental side, for instance, when we have done some work applying SAEs and probes, we've encountered, you know, some shortcomings in SAEs that we found a little bit surprising. And so have gone back to the drawing board and done work on that. And then, you know, we've done some work on better foundational interpreter models. And a lot of our team's research is focused on what is the next evolution beyond SAEs, for instance. And then when it comes to like control and design of models, you know, we tried steering with our first API and realized that it still fell short of black box techniques like prompting or fine tuning. And so went back to the drawing board and we're like, how do we make that not the case and how do we improve it beyond that? And one of our researchers, Ekdeep, who just joined is actually Ekdeep and Atticus are like steering experts and have spent a lot of time trying to figure out like, what is the research that enables us to actually do this in a much more powerful, robust way? So yeah, the answer is like, look at real world problems, try to translate that into a research agenda and then like hill climb on both of those at the same time.Shawn Wang [00:16:04]: Yeah. Mark has the steering CLI demo queued up, which we're going to go into in a sec. But I always want to double click on when you drop hints, like we found some problems with SAEs. Okay. What are they? You know, and then we can go into the demo. Yeah.Myra Deng [00:16:19]: I mean, I'm curious if you have more thoughts here as well, because you've done it in the healthcare domain. But I think like, for instance, when we do things like trying to detect behaviors within models that are harmful or like behaviors that a user might not want to have in their model. So hallucinations, for instance, harmful intent, PII, all of these things. We first tried using SAE probes for a lot of these tasks. So taking the feature activation space from SAEs and then training classifiers on top of that, and then seeing how well we can detect the properties that we might want to detect in model behavior. And we've seen in many cases that probes just trained on raw activations seem to perform better than SAE probes, which is a bit surprising if you think that SAEs are actually also capturing the concepts that you would want to capture cleanly and more surgically. And so that is an interesting observation. I don't think that is like, I'm not down on SAEs at all. I think there are many, many things they're useful for, but we have definitely run into cases where I think the concept space described by SAEs is not as clean and accurate as we would expect it to be for actual like real world downstream performance metrics.Mark Bissell [00:17:34]: Fair enough. Yeah. It's the blessing and the curse of unsupervised methods where you get to peek into the AI's mind. But sometimes you wish that you saw other things when you walked inside there. Although in the PII instance, I think weren't an SAE based approach actually did prove to be the most generalizable?Myra Deng [00:17:53]: It did work well in the case that we published with Rakuten. And I think a lot of the reasons it worked well was because we had a noisier data set. And so actually the blessing of unsupervised learning is that we actually got to get more meaningful, generalizable signal from SAEs when the data was noisy. But in other cases where we've had like good data sets, it hasn't been the case.Shawn Wang [00:18:14]: And just because you named Rakuten and I don't know if we'll get it another chance, like what is the overall, like what is Rakuten's usage or production usage? Yeah.Myra Deng [00:18:25]: So they are using us to essentially guardrail and inference time monitor their language model usage and their agent usage to detect things like PII so that they don't route private user information.Myra Deng [00:18:41]: And so that's, you know, going through all of their user queries every day. And that's something that we deployed with them a few months ago. And now we are actually exploring very early partnerships, not just with Rakuten, but with other people around how we can help with potentially training and customization use cases as well. Yeah.Shawn Wang [00:19:03]: And for those who don't know, like it's Rakuten is like, I think number one or number two e-commerce store in Japan. Yes. Yeah.Mark Bissell [00:19:10]: And I think that use case actually highlights a lot of like what it looks like to deploy things in practice that you don't always think about when you're doing sort of research tasks. So when you think about some of the stuff that came up there that's more complex than your idealized version of a problem, they were encountering things like synthetic to real transfer of methods. So they couldn't train probes, classifiers, things like that on actual customer data of PII. So what they had to do is use synthetic data sets. And then hope that that transfer is out of domain to real data sets. And so we can evaluate performance on the real data sets, but not train on customer PII. So that right off the bat is like a big challenge. You have multilingual requirements. So this needed to work for both English and Japanese text. Japanese text has all sorts of quirks, including tokenization behaviors that caused lots of bugs that caused us to be pulling our hair out. And then also a lot of tasks you'll see. You might make simplifying assumptions if you're sort of treating it as like the easiest version of the problem to just sort of get like general results where maybe you say you're classifying a sentence to say, does this contain PII? But the need that Rakuten had was token level classification so that you could precisely scrub out the PII. So as we learned more about the problem, you're sort of speaking about what that looks like in practice. Yeah. A lot of assumptions end up breaking. And that was just one instance where you. A problem that seems simple right off the bat ends up being more complex as you keep diving into it.Vibhu Sapra [00:20:41]: Excellent. One of the things that's also interesting with Interp is a lot of these methods are very efficient, right? So where you're just looking at a model's internals itself compared to a separate like guardrail, LLM as a judge, a separate model. One, you have to host it. Two, there's like a whole latency. So if you use like a big model, you have a second call. Some of the work around like self detection of hallucination, it's also deployed for efficiency, right? So if you have someone like Rakuten doing it in production live, you know, that's just another thing people should consider.Mark Bissell [00:21:12]: Yeah. And something like a probe is super lightweight. Yeah. It's no extra latency really. Excellent.Shawn Wang [00:21:17]: You have the steering demos lined up. So we were just kind of see what you got. I don't, I don't actually know if this is like the latest, latest or like alpha thing.Mark Bissell [00:21:26]: No, this is a pretty hacky demo from from a presentation that someone else on the team recently gave. So this will give a sense for, for technology. So you can see the steering and action. Honestly, I think the biggest thing that this highlights is that as we've been growing as a company and taking on kind of more and more ambitious versions of interpretability related problems, a lot of that comes to scaling up in various different forms. And so here you're going to see steering on a 1 trillion parameter model. This is Kimi K2. And so it's sort of fun that in addition to the research challenges, there are engineering challenges that we're now tackling. Cause for any of this to be sort of useful in production, you need to be thinking about what it looks like when you're using these methods on frontier models as opposed to sort of like toy kind of model organisms. So yeah, this was thrown together hastily, pretty fragile behind the scenes, but I think it's quite a fun demo. So screen sharing is on. So I've got two terminal sessions pulled up here. On the left is a forked version that we have of the Kimi CLI that we've got running to point at our custom hosted Kimi model. And then on the right is a set up that will allow us to steer on certain concepts. So I should be able to chat with Kimi over here. Tell it hello. This is running locally. So the CLI is running locally, but the Kimi server is running back to the office. Well, hopefully should be, um, that's too much to run on that Mac. Yeah. I think it's, uh, it takes a full, like each 100 node. I think it's like, you can. You can run it on eight GPUs, eight 100. So, so yeah, Kimi's running. We can ask it a prompt. It's got a forked version of our, uh, of the SG line code base that we've been working on. So I'm going to tell it, Hey, this SG line code base is slow. I think there's a bug. Can you try to figure it out? There's a big code base, so it'll, it'll spend some time doing this. And then on the right here, I'm going to initialize in real time. Some steering. Let's see here.Mark Bissell [00:23:33]: searching for any. Bugs. Feature ID 43205.Shawn Wang [00:23:38]: Yeah.Mark Bissell [00:23:38]: 20, 30, 40. So let me, uh, this is basically a feature that we found that inside Kimi seems to cause it to speak in Gen Z slang. And so on the left, it's still sort of thinking normally it might take, I don't know, 15 seconds for this to kick in, but then we're going to start hopefully seeing him do this code base is massive for real. So we're going to start. We're going to start seeing Kimi transition as the steering kicks in from normal Kimi to Gen Z Kimi and both in its chain of thought and its actual outputs.Mark Bissell [00:24:19]: And interestingly, you can see, you know, it's still able to call tools, uh, and stuff. It's um, it's purely sort of it's it's demeanor. And there are other features that we found for interesting things like concision. So that's more of a practical one. You can make it more concise. Um, the types of programs, uh, programming languages that uses, but yeah, as we're seeing it come in. Pretty good. Outputs.Shawn Wang [00:24:43]: Scheduler code is actually wild.Vibhu Sapra [00:24:46]: Yo, this code is actually insane, bro.Vibhu Sapra [00:24:53]: What's the process of training in SAE on this, or, you know, how do you label features? I know you guys put out a pretty cool blog post about, um, finding this like autonomous interp. Um, something. Something about how agents for interp is different than like coding agents. I don't know while this is spewing up, but how, how do we find feature 43, two Oh five. Yeah.Mark Bissell [00:25:15]: So in this case, um, we, our platform that we've been building out for a long time now supports all the sort of classic out of the box interp techniques that you might want to have like SAE training, probing things of that kind, I'd say the techniques for like vanilla SAEs are pretty well established now where. You take your model that you're interpreting, run a whole bunch of data through it, gather activations, and then yeah, pretty straightforward pipeline to train an SAE. There are a lot of different varieties. There's top KSAEs, batch top KSAEs, um, normal ReLU SAEs. And then once you have your sparse features to your point, assigning labels to them to actually understand that this is a gen Z feature, that's actually where a lot of the kind of magic happens. Yeah. And the most basic standard technique is look at all of your d input data set examples that cause this feature to fire most highly. And then you can usually pick out a pattern. So for this feature, If I've run a diverse enough data set through my model feature 43, two Oh five. Probably tends to fire on all the tokens that sounds like gen Z slang. You know, that's the, that's the time of year to be like, Oh, I'm in this, I'm in this Um, and, um, so, you know, you could have a human go through all 43,000 concepts andVibhu Sapra [00:26:34]: And I've got to ask the basic question, you know, can we get examples where it hallucinates, pass it through, see what feature activates for hallucinations? Can I just, you know, turn hallucination down?Myra Deng [00:26:51]: Oh, wow. You really predicted a project we're already working on right now, which is detecting hallucinations using interpretability techniques. And this is interesting because hallucinations is something that's very hard to detect. And it's like a kind of a hairy problem and something that black box methods really struggle with. Whereas like Gen Z, you could always train a simple classifier to detect that hallucinations is harder. But we've seen that models internally have some... Awareness of like uncertainty or some sort of like user pleasing behavior that leads to hallucinatory behavior. And so, yeah, we have a project that's trying to detect that accurately. And then also working on mitigating the hallucinatory behavior in the model itself as well.Shawn Wang [00:27:39]: Yeah, I would say most people are still at the level of like, oh, I would just turn temperature to zero and that turns off hallucination. And I'm like, well, that's a fundamental misunderstanding of how this works. Yeah.Mark Bissell [00:27:51]: Although, so part of what I like about that question is you, there are SAE based approaches that might like help you get at that. But oftentimes the beauty of SAEs and like we said, the curse is that they're unsupervised. So when you have a behavior that you deliberately would like to remove, and that's more of like a supervised task, often it is better to use something like probes and specifically target the thing that you're interested in reducing as opposed to sort of like hoping that when you fragment the latent space, one of the vectors that pops out.Vibhu Sapra [00:28:20]: And as much as we're training an autoencoder to be sparse, we're not like for sure certain that, you know, we will get something that just correlates to hallucination. You'll probably split that up into 20 other things and who knows what they'll be.Mark Bissell [00:28:36]: Of course. Right. Yeah. So there's no sort of problems with like feature splitting and feature absorption. And then there's the off target effects, right? Ideally, you would want to be very precise where if you reduce the hallucination feature, suddenly maybe your model can't write. Creatively anymore. And maybe you don't like that, but you want to still stop it from hallucinating facts and figures.Shawn Wang [00:28:55]: Good. So Vibhu has a paper to recommend there that we'll put in the show notes. But yeah, I mean, I guess just because your demo is done, any any other things that you want to highlight or any other interesting features you want to show?Mark Bissell [00:29:07]: I don't think so. Yeah. Like I said, this is a pretty small snippet. I think the main sort of point here that I think is exciting is that there's not a whole lot of inter being applied to models quite at this scale. You know, Anthropic certainly has some some. Research and yeah, other other teams as well. But it's it's nice to see these techniques, you know, being put into practice. I think not that long ago, the idea of real time steering of a trillion parameter model would have sounded.Shawn Wang [00:29:33]: Yeah. The fact that it's real time, like you started the thing and then you edited the steering vector.Vibhu Sapra [00:29:38]: I think it's it's an interesting one TBD of what the actual like production use case would be on that, like the real time editing. It's like that's the fun part of the demo, right? You can kind of see how this could be served behind an API, right? Like, yes, you're you only have so many knobs and you can just tweak it a bit more. And I don't know how it plays in. Like people haven't done that much with like, how does this work with or without prompting? Right. How does this work with fine tuning? Like, there's a whole hype of continual learning, right? So there's just so much to see. Like, is this another parameter? Like, is it like parameter? We just kind of leave it as a default. We don't use it. So I don't know. Maybe someone here wants to put out a guide on like how to use this with prompting when to do what?Mark Bissell [00:30:18]: Oh, well, I have a paper recommendation. I think you would love from Act Deep on our team, who is an amazing researcher, just can't say enough amazing things about Act Deep. But he actually has a paper that as well as some others from the team and elsewhere that go into the essentially equivalence of activation steering and in context learning and how those are from a he thinks of everything in a cognitive neuroscience Bayesian framework, but basically how you can precisely show how. Prompting in context, learning and steering exhibit similar behaviors and even like get quantitative about the like magnitude of steering you would need to do to induce a certain amount of behavior similar to certain prompting, even for things like jailbreaks and stuff. It's a really cool paper. Are you saying steering is less powerful than prompting? More like you can almost write a formula that tells you how to convert between the two of them.Myra Deng [00:31:20]: And so like formally equivalent actually in the in the limit. Right.Mark Bissell [00:31:24]: So like one case study of this is for jailbreaks there. I don't know. Have you seen the stuff where you can do like many shot jailbreaking? You like flood the context with examples of the behavior. And the topic put out that paper.Shawn Wang [00:31:38]: A lot of people were like, yeah, we've been doing this, guys.Mark Bissell [00:31:40]: Like, yeah, what's in this in context learning and activation steering equivalence paper is you can like predict the number. Number of examples that you will need to put in there in order to jailbreak the model. That's cool. By doing steering experiments and using this sort of like equivalence mapping. That's cool. That's really cool. It's very neat. Yeah.Shawn Wang [00:32:02]: I was going to say, like, you know, I can like back rationalize that this makes sense because, you know, what context is, is basically just, you know, it updates the KV cache kind of and like and then every next token inference is still like, you know, the sheer sum of everything all the way. It's plus all the context. It's up to date. And you could, I guess, theoretically steer that with you probably replace that with your steering. The only problem is steering typically is on one layer, maybe three layers like like you did. So it's like not exactly equivalent.Mark Bissell [00:32:33]: Right, right. There's sort of you need to get precise about, yeah, like how you sort of define steering and like what how you're modeling the setup. But yeah, I've got the paper pulled up here. Belief dynamics reveal the dual nature. Yeah. The title is Belief Dynamics Reveal the Dual Nature of Incompetence. And it's an exhibition of the practical context learning and activation steering. So Eric Bigelow, Dan Urgraft on the who are doing fellowships at Goodfire, Ekt Deep's the final author there.Myra Deng [00:32:59]: I think actually to your question of like, what is the production use case of steering? I think maybe if you just think like one level beyond steering as it is today. Like imagine if you could adapt your model to be, you know, an expert legal reasoner. Like in almost real time, like very quickly. efficiently using human feedback or using like your semantic understanding of what the model knows and where it knows that behavior. I think that while it's not clear what the product is at the end of the day, it's clearly very valuable. Thinking about like what's the next interface for model customization and adaptation is a really interesting problem for us. Like we have heard a lot of people actually interested in fine-tuning an RL for open weight models in production. And so people are using things like Tinker or kind of like open source libraries to do that, but it's still very difficult to get models fine-tuned and RL'd for exactly what you want them to do unless you're an expert at model training. And so that's like something we'reShawn Wang [00:34:06]: looking into. Yeah. I never thought so. Tinker from Thinking Machines famously uses rank one LoRa. Is that basically the same as steering? Like, you know, what's the comparison there?Mark Bissell [00:34:19]: Well, so in that case, you are still applying updates to the parameters, right?Shawn Wang [00:34:25]: Yeah. You're not touching a base model. You're touching an adapter. It's kind of, yeah.Mark Bissell [00:34:30]: Right. But I guess it still is like more in parameter space then. I guess it's maybe like, are you modifying the pipes or are you modifying the water flowing through the pipes to get what you're after? Yeah. Just maybe one way.Mark Bissell [00:34:44]: I like that analogy. That's my mental map of it at least, but it gets at this idea of model design and intentional design, which is something that we're, that we're very focused on. And just the fact that like, I hope that we look back at how we're currently training models and post-training models and just think what a primitive way of doing that right now. Like there's no intentionalityShawn Wang [00:35:06]: really in... It's just data, right? The only thing in control is what data we feed in.Mark Bissell [00:35:11]: So, so Dan from Goodfire likes to use this analogy of, you know, he has a couple of young kids and he talks about like, what if I could only teach my kids how to be good people by giving them cookies or like, you know, giving them a slap on the wrist if they do something wrong, like not telling them why it was wrong or like what they should have done differently or something like that. Just figure it out. Right. Exactly. So that's RL. Yeah. Right. And, and, you know, it's sample inefficient. There's, you know, what do they say? It's like slurping feedback. It's like, slurping supervision. Right. And so you'd like to get to the point where you can have experts giving feedback to their models that are, uh, internalized and, and, you know, steering is an inference time way of sort of getting that idea. But ideally you're moving to a world whereVibhu Sapra [00:36:04]: it is much more intentional design in perpetuity for these models. Okay. This is one of the questions we asked Emmanuel from Anthropic on the podcast a few months ago. Basically the question, was you're at a research lab that does model training, foundation models, and you're on an interp team. How does it tie back? Right? Like, does this, do ideas come from the pre-training team? Do they go back? Um, you know, so for those interested, you can, you can watch that. There wasn't too much of a connect there, but it's still something, you know, it's something they want toMark Bissell [00:36:33]: push for down the line. It can be useful for all of the above. Like there are certainly post-hocVibhu Sapra [00:36:39]: use cases where it doesn't need to touch that. I think the other thing a lot of people forget is this stuff isn't too computationally expensive, right? Like I would say, if you're interested in getting into research, MechInterp is one of the most approachable fields, right? A lot of this train an essay, train a probe, this stuff, like the budget for this one, there's already a lot done. There's a lot of open source work. You guys have done some too. Um, you know,Shawn Wang [00:37:04]: There's like notebooks from the Gemini team for Neil Nanda or like, this is how you do it. Just step through the notebook.Vibhu Sapra [00:37:09]: Even if you're like, not even technical with any of this, you can still make like progress. There, you can look at different activations, but, uh, if you do want to get into training, you know, training this stuff, correct me if I'm wrong is like in the thousands of dollars, not even like, it's not that high scale. And then same with like, you know, applying it, doing it for post-training or all this stuff is fairly cheap in scale of, okay. I want to get into like model training. I don't have compute for like, you know, pre-training stuff. So it's, it's a very nice field to get into. And also there's a lot of like open questions, right? Um, some of them have to go with, okay, I want a product. I want to solve this. Like there's also just a lot of open-ended stuff that people could work on. That's interesting. Right. I don't know if you guys have any calls for like, what's open questions, what's open work that you either open collaboration with, or like, you'd just like to see solved or just, you know, for people listening that want to get into McInturk because people always talk about it. What are, what are the things they should check out? Start, of course, you know, join you guys as well. I'm sure you're hiring.Myra Deng [00:38:09]: There's a paper, I think from, was it Lee, uh, Sharky? It's open problems and, uh, it's, it's a bit of interpretability, which I recommend everyone who's interested in the field. Read. I'm just like a really comprehensive overview of what are the things that experts in the field think are the most important problems to be solved. I also think to your point, it's been really, really inspiring to see, I think a lot of young people getting interested in interpretability, actually not just young people also like scientists to have been, you know, experts in physics for many years and in biology or things like this, um, transitioning into interp, because the barrier of, of what's now interp. So it's really cool to see a number to entry is, you know, in some ways low and there's a lot of information out there and ways to get started. There's this anecdote of like professors at universities saying that all of a sudden every incoming PhD student wants to study interpretability, which was not the case a few years ago. So it just goes to show how, I guess, like exciting the field is, how fast it's moving, how quick it is to get started and things like that.Mark Bissell [00:39:10]: And also just a very welcoming community. You know, there's an open source McInturk Slack channel. There are people are always posting questions and just folks in the space are always responsive if you ask things on various forums and stuff. But yeah, the open paper, open problems paper is a really good one.Myra Deng [00:39:28]: For other people who want to get started, I think, you know, MATS is a great program. What's the acronym for? Machine Learning and Alignment Theory Scholars? It's like the...Vibhu Sapra [00:39:40]: Normally summer internship style.Myra Deng [00:39:42]: Yeah, but they've been doing it year round now. And actually a lot of our full-time staff have come through that program or gone through that program. And it's great for anyone who is transitioning into interpretability. There's a couple other fellows programs. We do one as well as Anthropic. And so those are great places to get started if anyone is interested.Mark Bissell [00:40:03]: Also, I think been seen as a research field for a very long time. But I think engineering... I think engineers are sorely wanted for interpretability as well, especially at Goodfire, but elsewhere, as it does scale up.Shawn Wang [00:40:18]: I should mention that Lee actually works with you guys, right? And in the London office and I'm adding our first ever McInturk track at AI Europe because I see this industry applications now emerging. And I'm pretty excited to, you know, help push that along. Yeah, I was looking forward to that. It'll effectively be the first industry McInturk conference. Yeah. I'm so glad you added that. You know, it's still a little bit of a bet. It's not that widespread, but I can definitely see this is the time to really get into it. We want to be early on things.Mark Bissell [00:40:51]: For sure. And I think the field understands this, right? So at ICML, I think the title of the McInturk workshop this year was actionable interpretability. And there was a lot of discussion around bringing it to various domains. Everyone's adding pragmatic, actionable, whatever.Shawn Wang [00:41:10]: It's like, okay, well, we weren't actionable before, I guess. I don't know.Vibhu Sapra [00:41:13]: And I mean, like, just, you know, being in Europe, you see the Interp room. One, like old school conferences, like, I think they had a very tiny room till they got lucky and they got it doubled. But there's definitely a lot of interest, a lot of niche research. So you see a lot of research coming out of universities, students. We covered the paper last week. It's like two unknown authors, not many citations. But, you know, you can make a lot of meaningful work there. Yeah. Yeah. Yeah.Shawn Wang [00:41:39]: Yeah. I think people haven't really mentioned this yet. It's just Interp for code. I think it's like an abnormally important field. We haven't mentioned this yet. The conspiracy theory last two years ago was when the first SAE work came out of Anthropic was they would do like, oh, we just used SAEs to turn the bad code vector down and then turn up the good code. And I think like, isn't that the dream? Like, you know, like, but basically, I guess maybe, why is it funny? Like, it's... If it was realistic, it would not be funny. It would be like, no, actually, we should do this. But it's funny because we know there's like, we feel there's some limitations to what steering can do. And I think a lot of the public image of steering is like the Gen Z stuff. Like, oh, you can make it really love the Golden Gate Bridge, or you can make it speak like Gen Z. To like be a legal reasoner seems like a huge stretch. Yeah. And I don't know if that will get there this way. Yeah.Myra Deng [00:42:36]: I think, um, I will say we are announcing. Something very soon that I will not speak too much about. Um, but I think, yeah, this is like what we've run into again and again is like, we, we don't want to be in the world where steering is only useful for like stylistic things. That's definitely not, not what we're aiming for. But I think the types of interventions that you need to do to get to things like legal reasoning, um, are much more sophisticated and require breakthroughs in, in learning algorithms. And that's, um...Shawn Wang [00:43:07]: And is this an emergent property of scale as well?Myra Deng [00:43:10]: I think so. Yeah. I mean, I think scale definitely helps. I think scale allows you to learn a lot of information and, and reduce noise across, you know, large amounts of data. But I also think we think that there's ways to do things much more effectively, um, even, even at scale. So like actually learning exactly what you want from the data and not learning things that you do that you don't want exhibited in the data. So we're not like anti-scale, but we are also realizing that scale is not going to get us anywhere. It's not going to get us to the type of AI development that we want to be at in, in the future as these models get more powerful and get deployed in all these sorts of like mission critical contexts. Current life cycle of training and deploying and evaluations is, is to us like deeply broken and has opportunities to, to improve. So, um, more to come on that very, very soon.Mark Bissell [00:44:02]: And I think that that's a use basically, or maybe just like a proof point that these concepts do exist. Like if you can manipulate them in the precise best way, you can get the ideal combination of them that you desire. And steering is maybe the most coarse grained sort of peek at what that looks like. But I think it's evocative of what you could do if you had total surgical control over every concept, every parameter. Yeah, exactly.Myra Deng [00:44:30]: There were like bad code features. I've got it pulled up.Vibhu Sapra [00:44:33]: Yeah. Just coincidentally, as you guys are talking.Shawn Wang [00:44:35]: This is like, this is exactly.Vibhu Sapra [00:44:38]: There's like specifically a code error feature that activates and they show, you know, it's not, it's not typo detection. It's like, it's, it's typos in code. It's not typical typos. And, you know, you can, you can see it clearly activates where there's something wrong in code. And they have like malicious code, code error. They have a whole bunch of sub, you know, sub broken down little grain features. Yeah.Shawn Wang [00:45:02]: Yeah. So, so the, the rough intuition for me, the, why I talked about post-training was that, well, you just, you know, have a few different rollouts with all these things turned off and on and whatever. And then, you know, you can, that's, that's synthetic data you can kind of post-train on. Yeah.Vibhu Sapra [00:45:13]: And I think we make it sound easier than it is just saying, you know, they do the real hard work.Myra Deng [00:45:19]: I mean, you guys, you guys have the right idea. Exactly. Yeah. We replicated a lot of these features in, in our Lama models as well. I remember there was like.Vibhu Sapra [00:45:26]: And I think a lot of this stuff is open, right? Like, yeah, you guys opened yours. DeepMind has opened a lot of essays on Gemma. Even Anthropic has opened a lot of this. There's, there's a lot of resources that, you know, we can probably share of people that want to get involved.Shawn Wang [00:45:41]: Yeah. And special shout out to like Neuronpedia as well. Yes. Like, yeah, amazing piece of work to visualize those things.Myra Deng [00:45:49]: Yeah, exactly.Shawn Wang [00:45:50]: I guess I wanted to pivot a little bit on, onto the healthcare side, because I think that's a big use case for you guys. We haven't really talked about it yet. This is a bit of a crossover for me because we are, we are, we do have a separate science pod that we're starting up for AI, for AI for science, just because like, it's such a huge investment category and also I'm like less qualified to do it, but we actually have bio PhDs to cover that, which is great, but I need to just kind of recover, recap your work, maybe on the evil two stuff, but then, and then building forward.Mark Bissell [00:46:17]: Yeah, for sure. And maybe to frame up the conversation, I think another kind of interesting just lens on interpretability in general is a lot of the techniques that were described. are ways to solve the AI human interface problem. And it's sort of like bidirectional communication is the goal there. So what we've been talking about with intentional design of models and, you know, steering, but also more advanced techniques is having humans impart our desires and control into models and over models. And the reverse is also very interesting, especially as you get to superhuman models, whether that's narrow superintelligence, like these scientific models that work on genomics, data, medical imaging, things like that. But down the line, you know, superintelligence of other forms as well. What knowledge can the AIs teach us as sort of that, that the other direction in that? And so some of our life science work to date has been getting at exactly that question, which is, well, some of it does look like debugging these various life sciences models, understanding if they're actually performing well, on tasks, or if they're picking up on spurious correlations, for instance, genomics models, you would like to know whether they are sort of focusing on the biologically relevant things that you care about, or if it's using some simpler correlate, like the ancestry of the person that it's looking at. But then also in the instances where they are superhuman, and maybe they are understanding elements of the human genome that we don't have names for or specific, you know, yeah, discoveries that they've made that that we don't know about, that's, that's a big goal. And so we're already seeing that, right, we are partnered with organizations like Mayo Clinic, leading research health system in the United States, our Institute, as well as a startup called Prima Menta, which focuses on neurodegenerative disease. And in our partnership with them, we've used foundation models, they've been training and applied our interpretability techniques to find novel biomarkers for Alzheimer's disease. So I think this is just the tip of the iceberg. But it's, that's like a flavor of some of the things that we're working on.Shawn Wang [00:48:36]: Yeah, I think that's really fantastic. Obviously, we did the Chad Zuckerberg pod last year as well. And like, there's a plethora of these models coming out, because there's so much potential and research. And it's like, very interesting how it's basically the same as language models, but just with a different underlying data set. But it's like, it's the same exact techniques. Like, there's no change, basically.Mark Bissell [00:48:59]: Yeah. Well, and even in like other domains, right? Like, you know, robotics, I know, like a lot of the companies just use Gemma as like the like backbone, and then they like make it into a VLA that like takes these actions. It's, it's, it's transformers all the way down. So yeah.Vibhu Sapra [00:49:15]: Like we have Med Gemma now, right? Like this week, even there was Med Gemma 1.5. And they're training it on this stuff, like 3d scans, medical domain knowledge, and all that stuff, too. So there's a push from both sides. But I think the thing that, you know, one of the things about McInturpp is like, you're a little bit more cautious in some domains, right? So healthcare, mainly being one, like guardrails, understanding, you know, we're more risk adverse to something going wrong there. So even just from a basic understanding, like, if we're trusting these systems to make claims, we want to know why and what's going on.Myra Deng [00:49:51]: Yeah, I think there's totally a kind of like deployment bottleneck to actually using. foundation models for real patient usage or things like that. Like, say you're using a model for rare disease prediction, you probably want some explanation as to why your model predicted a certain outcome, and an interpretable explanation at that. So that's definitely a use case. But I also think like, being able to extract scientific information that no human knows to accelerate drug discovery and disease treatment and things like that actually is a really, really big unlock for science, like scientific discovery. And you've seen a lot of startups, like say that they're going to accelerate scientific discovery. And I feel like we actually are doing that through our interp techniques. And kind of like, almost by accident, like, I think we got reached out to very, very early on from these healthcare institutions. And none of us had healthcare.Shawn Wang [00:50:49]: How did they even hear of you? A podcast.Myra Deng [00:50:51]: Oh, okay. Yeah, podcast.Vibhu Sapra [00:50:53]: Okay, well, now's that time, you know.Myra Deng [00:50:55]: Everyone can call us.Shawn Wang [00:50:56]: Podcasts are the most important thing. Everyone should listen to podcasts.Myra Deng [00:50:59]: Yeah, they reached out. They were like, you know, we have these really smart models that we've trained, and we want to know what they're doing. And we were like, really early that time, like three months old, and it was a few of us. And we were like, oh, my God, we've never used these models. Let's figure it out. But it's also like, great proof that interp techniques scale pretty well across domains. We didn't really have to learn too much about.Shawn Wang [00:51:21]: Interp is a machine learning technique, machine learning skills everywhere, right? Yeah. And it's obviously, it's just like a general insight. Yeah. Probably to finance too, I think, which would be fun for our history. I don't know if you have anything to say there.Mark Bissell [00:51:34]: Yeah, well, just across the science. Like, we've also done work on material science. Yeah, it really runs the gamut.Vibhu Sapra [00:51:40]: Yeah. Awesome. And, you know, for those that should reach out, like, you're obviously experts in this, but like, is there a call out for people that you're looking to partner with, design partners, people to use your stuff outside of just, you know, the general developer that wants to. Plug and play steering stuff, like on the research side more so, like, are there ideal design partners, customers, stuff like that?Myra Deng [00:52:03]: Yeah, I can talk about maybe non-life sciences, and then I'm curious to hear from you on the life sciences side. But we're looking for design partners across many domains, language, anyone who's customizing language models or trying to push the frontier of code or reasoning models is really interesting to us. And then also interested in the frontier of modeling. There's a lot of models that work in, like, pixel space, as we call it. So if you're doing world models, video models, even robotics, where there's not a very clean natural language interface to interact with, I think we think that Interp can really help and are looking for a few partners in that space.Shawn Wang [00:52:43]: Just because you mentioned the keyword

united states america god world ai europe english stories china japan future mission real space san francisco phd podcasts research thinking story japanese focus losing safety current healthcare train institute gen z product belief alzheimer's disease mac honestly programs stronger flowers weak unicorns sort roles berkeley arrival crucial getting started limitations final thoughts paint gemini openai your life machine learning bugs plug api mid frontier qu'en applied ml lama plaza mayo clinic apis definitions llm our lives ccp nda mats grok identified palantir tbd kimi detecting creatively hallucinations steering turing anthropic incompetence sg kv gpus tinker golden gate bridge series b ilya deepmind backgrounds deng fine tuning r1 admire bayesian rl probes rakuten prompting jpeg mochi cli objectively sae latent pii outputs generalization ted chiang practical approach algernon sharky bissell 25b vla scheduler equivalence mle dual nature mechanistic exhalation saes neurips technical staff two sigma interpretability thinking machines rlhf icml grokking shawn wang latent space english japanese

"Deep learning as program synthesis" by Zach Furman

LessWrong Curated Podcast

Play Episode Listen Later Jan 24, 2026 71:42

Audio note: this article contains 73 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Epistemic status: This post is a synthesis of ideas that are, in my experience, widespread among researchers at frontier labs and in mechanistic interpretability, but rarely written down comprehensively in one place - different communities tend to know different pieces of evidence. The core hypothesis - that deep learning is performing something like tractable program synthesis - is not original to me (even to me, the ideas are ~3 years old), and I suspect it has been arrived at independently many times. (See the appendix on related work). This is also far from finished research - more a snapshot of a hypothesis that seems increasingly hard to avoid, and a case for why formalization is worth pursuing. I discuss the key barriers and how tools like singular learning theory might address them towards the end of the post. Thanks to Dan Murfet, Jesse Hoogland, Max Hennick, and Rumi Salazar for feedback on this post. Sam Altman: Why does unsupervised learning work? Dan Selsam: Compression. So, the ideal intelligence [...] ---Outline:(02:31) Background(09:06) Looking inside(09:09) Grokking(16:04) Vision circuits(22:37) The hypothesis(26:04) Why this isnt enough(27:22) Indirect evidence(32:44) The paradox of approximation(38:34) The paradox of generalization(45:44) The paradox of convergence(51:46) The path forward(53:20) The representation problem(58:38) The search problem(01:07:20) Appendix(01:07:23) Related work The original text contained 14 footnotes which were omitted from this narration. --- First published: January 20th, 2026 Source: https://www.lesswrong.com/posts/Dw8mskAvBX37MxvXo/deep-learning-as-program-synthesis-1 --- Narrated by TYPE III AUDIO. ---Images from the article:

vision images outline deep learning synthesis furman indirect appendix epistemic grokking

Kodsnack 673 - Ett mönster för allt

Kodsnack

Play Episode Listen Later Dec 2, 2025 47:51

Fredrik och Kristoffer snackar musik, böcker, och kodstädande. Kristoffer överväger ett hus, som tillbehör till sina trummor. Efter att ha avhandlat bra ljud på olika sätt diskuterar vi böcker och hur trevligt det känns att läsa lite längre saker. Fredrik har tidiga intryck av Christopher Alexanders A pattern language och har läst ut Kent Becks Tidy first. Som avslutning diskuterar vi att försöka städa och modernisera sina kodbaser, både med och utan stöd av böcker. Ett stort tack till Cloudnet som sponsrar vår VPS! Har du kommentarer, frågor eller tips? Vi är @kodsnack, @thieta, @krig, och @bjoreman på Mastodon, har en sida på Facebook och epostas på info@kodsnack.se om du vill skriva längre. Vi läser allt som skickas. Gillar du Kodsnack får du hemskt gärna recensera oss i iTunes! Du kan också stödja podden genom att ge oss en kaffe (eller två!) på Ko-fi, eller handla något i vår butik. Länkar Mëtta - få 20% på ditt första köp med koden "Kodsnack". Audient ID14 och ASP880 Jeff Bezos hyrde Venedig Room tone - Criterion collections rumsljuds-video IR - impulse response Be Google-högtalare spela ambiens Studio Ghibli Ken Follett En värld utan slut Earthsea-serien Black Sabbath Geezer Butler och hans självbiografi Tony Iommi och hans självbiografi Stöd oss på Ko-fi! A pattern language "Den andra boken" är The timeless way of building The nature of order Tidy first Kent Beck If I had more time, I would have written a shorter letter Scrum-guiden Refactoring Extract helper Oskar heter skaparen av bland annat Townscaper. Han gästade också avsnitt 448 Notes on the synthesis of form Structured design ML Ocaml Rust Kubernetes Kubernetes client för Javascript Request-biblioteket som är nedlagt Fetch Typescript Commonjs ESM AMD - asynchronous module definition Eric Normand gästade avsnitt 570 Grokking simplicity Titlar För att kunna spela in trummor Lika långt från virveltrumman Ett jätteavancerat ljudkort En trumma för sig Plocka ut reverbet Ambiensen av ett datacenter Ljudet av Ghibli som pågår Arg på Ken Follett Arbetarklasskillar från Birmingham Rock'n'roll-lever Inte en jättestor bok Ett mönster för allt Vurmandet för små byar En perfekt stad Inget ägandeskap Tvättat bort allt konkret Bara det konkreta Tekniker för att sortera dina hålkort Hur man gör saker inkrementellt bättre

tv ir ko efter ett allt inte mastodon bara scrum structured fredrik oskar ghibli criterion inget arg vps tidy lika kristoffer gillar tony iommi earthsea ljudet plocka tekniker grokking townscaper eric normand kodsnack cloudnet

Understanding Grokking

Waking Up With AI

Play Episode Listen Later Sep 25, 2025 15:58

On this week's episode of “Paul, Weiss Waking Up With AI,” Katherine Forrest explores “grokking,” an emergent learning phenomenon where AI models continue to learn and begin to generalize well after training appears complete. ## Learn More About Paul, Weiss's Artificial Intelligence practice: https://www.paulweiss.com/industries/artificial-intelligence

ai artificial intelligence weiss grokking

“Grokking,” Sitecore RCE, & Agentic AI Risk

The Other Side Of The Firewall

Play Episode Listen Later Sep 9, 2025 58:06

In this episode, the hosts discuss the latest cybersecurity news, focusing on the malicious use of AI in social media, critical vulnerabilities that require immediate attention, and the implications of agentic AI on security. They reflect on personal experiences with technology and media, emphasizing the balance between convenience and security in the digital age. Articles: Scammers Are Using Grok to Spread Malicious Links on X https://www.darkreading.com/threat-intelligence/scammers-grok-malicious-links-x?fbclid=IwZXh0bgNhZW0CMTAAYnJpZBExZG82REpGUzh6SnZieGt2QQEeSA7DbtrvCMOcsBEhNGK_ijWm9pqvBW6duXMddYGqsE63G--mtoPtcVQqzL0_aem_N51eXLd14ZEzaExb1veXjg CISA Orders Immediate Patch of Critical Sitecore Vulnerability Under Active Exploitation https://thehackernews.com/2025/09/cisa-orders-immediate-patch-of-critical.html?m=1&fbclid=IwZXh0bgNhZW0CMTAAYnJpZBExZG82REpGUzh6SnZieGt2QQEey7we25AF8X5FTcPHRm5jR62j7Ir3XHgw2fjwZdQm9RHArkRt7_4FkFy-A2Q_aem_DFV94HopdE7iILwhUF0s5g Agentic AI: A CISO's security nightmare in the making? https://www.csoonline.com/article/4047974/agentic-ai-a-cisos-security-nightmare-in-the-making.html?fbclid=IwZXh0bgNhZW0CMTAAYnJpZBExZG82REpGUzh6SnZieGt2QQEeCqg8EeRHcdzpkFDhoR6r6ocJvaF1TwCQNI7jufsN2GGRKHay0X2z6cnW2Vk_aem_H19x6LTqZoKQazKJM8wFSg Buy the guide: https://theothersideofthefirewall.com Please LISTEN

ai sports risk clubhouse threads tunein agentic sitecore grokking

Kirill Bobrov Reflects on Grokking Concurrency

Book Overflow

Play Episode Listen Later Jul 18, 2025 56:41

In this special episode of Book Overflow, Carter and Nathan are joined by Kirill Bobrov, author of Grokking Concurrency! Join them as Kirill reflects on what it was like writing a book for the first time, why concurrency interests him so much, and more!-- Books Mentioned in this Episode --Note: As an Amazon Associate, we earn from qualifying purchases.----------------------------------------------------------Grokking Concurrency by Kirill Bobrovhttps://amzn.to/3GRbnby (paid link)https://mng.bz/Z99m (45% off!)----------------Spotify: https://open.spotify.com/show/5kj6DLCEWR5nHShlSYJI5LApple Podcasts: https://podcasts.apple.com/us/podcast/book-overflow/id1745257325X: https://x.com/bookoverflowpodCarter on X: https://x.com/cartermorganNathan's Functionally Imperative: www.functionallyimperative.com----------------Book Overflow is a podcast for software engineers, by software engineers dedicated to improving our craft by reading the best technical books in the world. Join Carter Morgan and Nathan Toups as they read and discuss a new technical book each week!The full book schedule and links to every major podcast player can be found at https://www.bookoverflow.io

spotify reflects kirill amazon associates books mentioned concurrency grokking

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 5, 2025 85:21

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher's ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

ai training collapse dynamics weight watchers rag generalization charles martin deep neural networks grokking

Coordination in Distributed Systems - Grokking Concurrency by Kirill Bobrov

Book Overflow

Play Episode Listen Later May 20, 2025 61:10

In this episode of Book Overflow, Carter and Nathan discuss the second half of Grokking Concurrency by Kirill Bobrov! Join them as they discuss the mutexes, semaphores, the reactor pattern, and more!-- Books Mentioned in this Episode --Note: As an Amazon Associate, we earn from qualifying purchases.----------------------------------------------------------Grokking Concurrency by Kirill Bobrovhttps://amzn.to/3GRbnby (paid link)----------------Spotify: https://open.spotify.com/show/5kj6DLCEWR5nHShlSYJI5LApple Podcasts: https://podcasts.apple.com/us/podcast/book-overflow/id1745257325X: https://x.com/bookoverflowpodCarter on X: https://x.com/cartermorganNathan's Functionally Imperative: www.functionallyimperative.com----------------Book Overflow is a podcast for software engineers, by software engineers dedicated to improving our craft by reading the best technical books in the world. Join Carter Morgan and Nathan Toups as they read and discuss a new technical book each week!The full book schedule and links to every major podcast player can be found at https://www.bookoverflow.io

spotify coordination kirill amazon associates books mentioned distributed systems concurrency grokking

Basics of Concurrency - Grokking Concurrency by Kirill Bobrov

Book Overflow

Play Episode Listen Later May 5, 2025 60:49

In this episode of Book Overflow, Carter and Nathan discuss the first half of Grokking Concurrency by Kirill Bobrov! Join them as they discuss the basic building blocks of concurrency, how concurrency has evolved over time, and how building concurrent applications can increase performance!Go Proverbs: https://go-proverbs.github.io/-- Books Mentioned in this Episode --Note: As an Amazon Associate, we earn from qualifying purchases.----------------------------------------------------------Grokking Concurrency by Kirill Bobrovhttps://amzn.to/3GRbnby (paid link)Web Scalability for Startup Engineers by Artur Ejsmonthttps://amzn.to/3F1VWwF (paid link)----------------00:00 Intro02:07 About the Book and Author03:35 Initial Thoughts on the Book09:12 What is Concurrency vs Parallelism12:35 CPUs and Moore's Law22:19 IO Performance, Embarrassingly Parallel and Conway's Law28:25 Building Blocks of Concurrency: Processes and Threads33:05 Memory Sharing vs Communicating39:13 Multitasking and Context Switching45:24 Task Decomposition and Data Pipelines52:35 Final Thoughts----------------Spotify: https://open.spotify.com/show/5kj6DLCEWR5nHShlSYJI5LApple Podcasts: https://podcasts.apple.com/us/podcast/book-overflow/id1745257325X: https://x.com/bookoverflowpodCarter on X: https://x.com/cartermorganNathan's Functionally Imperative: www.functionallyimperative.com----------------Book Overflow is a podcast for software engineers, by software engineers dedicated to improving our craft by reading the best technical books in the world. Join Carter Morgan and Nathan Toups as they read and discuss a new technical book each week!The full book schedule and links to every major podcast player can be found at https://www.bookoverflow.io

basics conway multitasking building blocks initial thoughts cpus kirill amazon associates books mentioned concurrency grokking

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Machine Learning Street Talk

Play Episode Listen Later Feb 8, 2025 78:10

Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?Goto https://tufalabs.ai/***Randall Balestrierohttps://x.com/randall_balestrhttps://randallbalestriero.github.io/Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0TOC:- Introduction - 00:00:00: Introduction- Neural Network Geometry and Spline Theory - 00:01:41: Neural Network Geometry and Spline Theory - 00:07:41: Deep Networks Always Grok - 00:11:39: Grokking and Adversarial Robustness - 00:16:09: Double Descent and Catastrophic Forgetting- Reconstruction Learning - 00:18:49: Reconstruction Learning - 00:24:15: Frequency Bias in Neural Networks- Geometric Analysis of Neural Networks - 00:29:02: Geometric Analysis of Neural Networks - 00:34:41: Adversarial Examples and Region Concentration- LLM Safety and Geometric Analysis - 00:40:05: LLM Safety and Geometric Analysis - 00:46:11: Toxicity Detection in LLMs - 00:52:24: Intrinsic Dimensionality and Model Control - 00:58:07: RLHF and High-Dimensional Spaces- Conclusion - 01:02:13: Neural Tangent Kernel - 01:08:07: ConclusionREFS:[00:01:35] Humayun – Deep network geometry & input space partitioninghttps://arxiv.org/html/2408.04809v1[00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operatorshttps://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf[00:13:55] Song et al. – Gradient-based white-box adversarial attackshttps://arxiv.org/abs/2012.14965[00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustnesshttps://arxiv.org/abs/2402.15555[00:18:25] Humayun – Training dynamics & double descent via linear region evolutionhttps://arxiv.org/abs/2310.12977[00:20:15] Balestriero – Power diagram partitions in DNN decision boundarieshttps://arxiv.org/abs/1905.08443[00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruninghttps://arxiv.org/abs/1803.03635[00:24:00] Belkin et al. – Double descent phenomenon in modern MLhttps://arxiv.org/abs/1812.11118[00:25:55] Balestriero et al. – Batch normalization's regularization effectshttps://arxiv.org/pdf/2209.14778[00:29:35] EU – EU AI Act 2024 with compute restrictionshttps://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf[00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometryhttps://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf[00:40:40] Carlini – Trade-offs between adversarial robustness and accuracyhttps://arxiv.org/pdf/2407.20099[00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methodshttps://openreview.net/forum?id=ez7w0Ss4g9(truncated, see shownotes PDF)

song prof zurich brave new world batch genai agi large language models origami elastic neural networks characterization gradient belkin humayun dnn rlhf icml cvpr grokking

Grokking Application Security: Malcolm McDonald on Securing Your Apps

The .NET Core Podcast

Play Episode Listen Later Feb 7, 2025 72:08

RJJ Software's Software Development Service This episode of The Modern .NET Show is supported, in part, by RJJ Software's Podcasting Services, whether your company is looking to elevate its UK operations or reshape its US strategy, we can provide tailored solutions that exceed expectations. Show Notes "This should be a mantra for anyone working on web applications or web APIs: you should validate any input of the application as much as possible. When you conceptualize it, all the malicious input into your application comes across in an HTTP request at some point. It's either in the URL or in the body. That's generally speaking, unless you have a supply chain attack, that's generally the vector by which your application will be attacked. Before you even start to kind of think about specific vulnerabilities, if you narrow down what is permissible input right out of the gate, then you kind of close the door on so many different vulnerabilities without even thinking about it."— Malcolm McDonald Welcome friends to The Modern .NET Show; the premier .NET podcast, focusing entirely on the knowledge, tools, and frameworks that all .NET developers should have in their toolbox. We are the go-to podcast for .NET developers worldwide, and I am your host: Jamie "GaProgMan" Taylor. In this episode, Malcolm McDonald joined us to talk about the importance of Application Security, knowing the basics of secure coding, and backing that all up with both "secure by default" designs, and subscribing to CVE notifications. "So what you find in your career is that if you set down kind of sensible default behaviour in your codebase earlier on, then everyone will follow that. You establish, kind of, best practices. It might take a little bit longer, but if you say, "this is the way we do this over here," then even yourself when you, kind of, come back to, will kind of, reproduce that path. So the initial, kind of, like first few check-ins to your source control are key to, like, establishing best practice"— Malcolm McDonald One note before we begin: some of Malcolm's audio is a little rough. Mark, our superstar editor, has done what he can, but there are parts that might be a little unclear. If you have trouble with the audio, please remember that there is a full transcription linked in the accompanying show notes. Anyway, without further ado, let's sit back, open up a terminal, type in `dotnet new podcast` and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-7/grokking-application-security-malcolm-mcdonald-on-securing-your-apps/ Useful Links Malcolm on LinkedIn Grokking Application Security Hacksplaining Hash function Don't Accept The Defaults Bcrypt Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in Touch: Via the contact page Joining the Discord Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast. Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show

music uk modern mcdonald apps securing apis podchaser cve application security .net grokking

Automating Scientific Discovery, with Andrew White, Head of Science at Future House

Play Episode Listen Later Dec 5, 2024 118:32

In this episode of The Cognitive Revolution, Nathan interviews Andrew White, Professor of Chemical Engineering at the University of Rochester and Head of Science at Future House. We explore groundbreaking AI systems for scientific discovery, including PaperQA and Aviary, and discuss how large language models are transforming research. Join us for an insightful conversation about the intersection of AI and scientific advancement with this pioneering researcher in his first-ever podcast appearance. Check out Future House: https://www.futurehouse.org Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognitive Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive CHAPTERS: (00:00:00) Teaser (00:01:13) About the Episode (00:04:37) Andrew White's Journey (00:10:23) GPT-4 Red Team (00:15:33) GPT-4 & Chemistry (00:17:54) Sponsors: Oracle Cloud Infrastructure (OCI) | SelectQuote (00:20:19) Biology vs Physics (00:23:14) Conceptual Dark Matter (00:26:27) Future House Intro (00:30:42) Semi-Autonomous AI (00:35:39) Sponsors: Shopify (00:37:00) Lab Automation (00:39:46) In Silico Experiments (00:45:22) Cost of Experiments (00:51:30) Multi-Omic Models (00:54:54) Scale and Grokking (01:00:53) Future House Projects (01:10:42) Paper QA Insights (01:16:28) Generalizing to Other Domains (01:17:57) Using Figures Effectively (01:22:01) Need for Specialized Tools (01:24:23) Paper QA Cost & Latency (01:27:37) Aviary: Agents & Environments (01:31:42) Black Box Gradient Estimation (01:36:14) Open vs Closed Models (01:37:52) Improvement with Training (01:40:00) Runtime Choice & Q-Learning (01:43:43) Narrow vs General AI (01:48:22) Future Directions & Needs (01:53:22) Future House: What's Next? (01:55:32) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/ Youtube: https://www.youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Training Zamba: A Hybrid Model Master Class with Zyphra's Quentin Anthony

Play Episode Listen Later Oct 30, 2024 145:00

In this episode of The Cognitive Revolution, Nathan dives deep into the world of state space models with returning co-host Jason Meaux and special guest Quentin Anthony, Head of Model Training at Zyphra. Explore the cutting-edge Zamba 2-7b model, which combines selective state space and attention mechanisms. Uncover practical insights on model training, architectural choices, and the challenges of scaling AI. From learning schedules to hybrid architectures, loss metrics to context length extension, this technical discussion covers it all. Don't miss this in-depth conversation on the future of personalized, on-device AI. Check out more about Zyphra and Jason Meaux here: Zyphra's website: https://www.zyphra.com Zamba2-7B Blog: https://www.zyphra.com/post/zamba2-7b Zamba2 GitHub: https://github.com/Zyphra/Zamba2 Tree attention: https://www.zyphra.com/post/tree-attention-topology-aware-decoding-for-long-context-attention-on-gpu-clusters Jason's Meaux Twitter: https://x.com/KamaraiCode Jason's Meaux website: https://www.statespace.info Be notified early when Turpentine's drops new publication: https://www.turpentine.co/exclusiveaccess SPONSORS: Weights & Biases RAG++: Advanced training for building production-ready RAG applications. Learn from experts to overcome LLM challenges, evaluate systematically, and integrate advanced features. Includes free Cohere credits. Visit https://wandb.me/cr to start the RAG++ course today. Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitiverevolution LMNT: LMNT is a zero-sugar electrolyte drink mix that's redefining hydration and performance. Ideal for those who fast or anyone looking to optimize their electrolyte intake. Support the show and get a free sample pack with any purchase at https://drinklmnt.com/tcr CHAPTERS: (00:00:00) Teaser (00:00:42) About the Show (00:01:05) About the Episode (00:03:09) Introducing Zyphra (00:07:28) Personalization in AI (00:12:48) State Space Models & Efficiency (Part 1) (00:19:22) Sponsors: Weights & Biases RAG++ | Shopify (00:21:26) State Space Models & Efficiency (Part 2) (00:22:23) Dense Attention to Shared Attention (00:29:41) Zyphra's Early Bet on Mamba (Part 1) (00:33:18) Sponsors: Notion | LMNT (00:36:00) Zyphra's Early Bet on Mamba (Part 2) (00:37:22) Loss vs. Model Quality (00:44:53) Emergence & Grokking (00:50:06) Loss Landscapes & Convergence (00:56:55) Sophia, Distillation & Secrets (01:09:00) Competing with Big Tech (01:23:50) The Future of Model Training (01:30:02) Deep Dive into Zamba 1 (01:34:24) Zamba 2 and Mamba 2 (01:38:56) Context Extension & Memory (01:44:04) Sequence Parallelism (01:45:44) Zamba 2 Architecture (01:53:57) Mamba Attention Hybrids (02:00:00) Lock-in Effects (02:05:32) Mamba Hybrids in Robotics (02:07:07) Ease of Use & Compatibility (02:12:10) Tree Attention vs. Ring Attention (02:22:02) Zyphra's Vision & Goals (02:23:57) Outro SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://www.linkedin.com/in/nathanlabenz/

660: The Bendy Piece

Grumpy Old Geeks

Play Episode Listen Later Aug 10, 2024 73:50

NFTs & crypto, oh my; Google ruled a monopolist in search; you can't run an AI company ethically - with plenty of examples; Grokking without consent; Google's targeting "unknown" minors loophole; MetaAI celebrity voices; watermarking ChatGPT challenges; emotionally hooked on chatbots; Neuralink's second patient implant; Synchro's stent approach; Cybertruck spottings; Deadpool & Wolverine; American Conspiracy: the Octopus Murders; 2024 Olympics; enraging Google, Uber ads; living that iPad life; Humanes being returned; Sonos $30 million app fail; Lego Star Wars; advanced hinges; DarJar; losing power; medical system problems.Sponsors:Mood - Run it back with 20% off your first order and FREE gummies. Go to hellomood.com and use promo code GOG.1Password Extended Access Management - Check it out at 1Password.com/xam. Secure every sign-in for every app on every device.DeleteMe - Head over to JoinDeleteMe.com/GOG and use the code "GOG" for 20% off.Private Internet Access - Go to GOG.Show/vpn and sign up today. For a limited time only, you can get OUR favorite VPN for as little as $2.03 a month.SetApp - With a single monthly subscription you get 240+ apps for your Mac. Go to SetApp and get started today!!!Show notes at https://gog.show/660FOLLOW UPIllinois changes biometric privacy law to help corporations avoid big payoutsTrump-themed $DJT token rug-pulls, people blame Martin Shkreli or Barron TrumpFTX to pay $12.7B to victims of Sam Bankman-Fried's massive schemeIN THE NEWSGoogle 'is a monopolist' in search, US judge rules in antitrust caseIt's practically impossible to run a big AI company ethicallyNvidia Used Videos From Netflix and YouTube to Build AI ModelElon Musk sues OpenAI, Sam Altman for making a “fool” out of himX accused of using EU user data to train Grok without consentX agrees to pause EU data processing for training GrokMeta and Google secretly targeted minors on YouTube with Instagram adsMeta is reportedly offering millions to get Hollywood voices into its AI projectsOpenAI confirms it's looking into text watermarking for ChatGPT that could expose cheating studentsOpenAI Warns Users Could Become Emotionally Hooked on Its Voice ModeNeuralink successfully implants its chip into a second patient's brainAs Neuralink Implants Second Subject's Brain, First Patient Says His Doesn't Work as Well AnymoreGates, Bezos back $75M round for Synchron's drill-free brain-computer interface implantWhat Synchron leaders learned from developing a stent-based BCIFiery Cybertruck Death Spurs Federal ProbeMEDIA CANDYDeadpool & WolverineAmerican Conspiracy: The Octopus MurdersGoogle will no longer air an Olympics ad that showed a child using AI to write a fan letterGoogle + Team USA — Dear SydneyOn Our Way - UberAPPS & DOODADSESR for iPad Pro 12.9 Case (2022/2021/2020, 6th/5th/4th Generation), iPad Air 13 Inch Case 2024 M2, Convenient Magnetic Attachment, 2-Way Stand, Rebound Magnetic Case, BlackHumane's daily returns are outpacing salesCalifornians Are Getting Apple Wallet Driver's Licenses This YearSonos' $30M app fail is cautionary tale against rushing unnecessary updatesHome Security Giant ADT Admits It Has Been HackedThursday, the dating app that you can use only on Thursdays, expands to San FranciscoMultitasking with ChatGPT on the Mac just got a lot betterTHE DARK SIDE WITH DAVEDave BittnerThe CyberWireHacking HumansCaveatControl LoopOnly Malware in the BuildingLEGO® Star Wars: Rebuild the GalaxyThe Dark FalconCLOSING SHOUT-OUTSTexas school bans all-black clothing, cites mental health concernsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Grokking Synthetic Biology | Dmitriy Ryaboy (Twitter, Ginkgo Bioworks)

Software Misadventures

Play Episode Listen Later Jul 16, 2024 68:51

From building a data platform and Parquet at Twitter to using AI to make biology easier to engineer at Ginkgo Bioworks, Dmitriy joins the show to chat about the early days of big data, the conversation that made him jump into SynBio, LLMs for proteins and more. Segments: (00:03:18) Data engineering roots (00:05:40) Early influences at Lawrence Berkeley Lab (00:09:46) Value of a "gentleman's education in computer science" (00:14:34) The end of junior software engineers (00:20:10) Deciding to go back to school (00:21:36) Early experiments with distributed systems (00:23:33) The early days of big data (00:29:16) "The thing we used to call big data is now ai" (00:31:02) The maturation of data engineering (00:35:05) From consumer tech to biotech (00:37:42) "The 21st century is the century of biology" (00:40:54) The science of lab automation (00:47:22) Software development in biotech vs. consumer tech (00:50:34) Swes make more $$ than scientists? (00:54:27) Llms for language is boring. Llms for proteins? that's cool (01:02:52) Protein engineering 101 (01:06:01) Model explainability in biology Show Notes: The Death of the Junior Developer: https://sourcegraph.com/blog/the-death-of-the-junior-developer Dmitriy on twitter: https://x.com/squarecog?lang=en Tech and Bio slack community: https://www.bitsinbio.org/ Stay in touch: - Make Ronak's day by signing up for our newsletter to get our favorites parts of the convo straight to your inbox every week :D https://softwaremisadventures.com/ Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en

ai tech data model software deciding bio protein segments synthetic biology unported data engineering ginkgo parquet dmitriy junior developer synbio grokking lawrence berkeley lab

The ARC Prize: Efficiency, Intuition, and AGI, with Mike Knoop, co-founder of Zapier

Play Episode Listen Later Jul 13, 2024 118:32

Nathan interviews Mike Knoop, co-founder of Zapier and co-creator of the ARC Prize, about the $1 million competition for more efficient AI architectures. They discuss the ARC AGI benchmark, its implications for general intelligence, and the potential impact on AI safety. Nathan reflects on the challenges of intuitive problem-solving in AI and considers hybrid approaches to AGI development. Apply to join over 400 founders and execs in the Turpentine Network: https://hmplogxqz0y.typeform.com/to/JCkphVqj RECOMMENDED PODCAST: Patrick McKenzie (@patio11) talks to experts who understand the complicated but not unknowable systems we rely on. You might be surprised at how quickly Patrick and his guests can put you in the top 1% of understanding for stock trading, tech hiring, and more. Spotify: https://open.spotify.com/show/3Mos4VE3figVXleHDqfXOH Apple: https://podcasts.apple.com/us/podcast/complex-systems-with-patrick-mckenzie-patio11/id1753399812 SPONSORS: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. CHAPTERS: (00:00:00) About the Show (00:06:06) The ARC Benchmark (00:09:34) Other Benchmarks (00:10:58) Definition of AGI (00:14:38) The rules of the contest (00:18:16) ARC test set (Part 1) (00:18:23) Sponsors: Oracle | Brave (00:20:31) ARC test set (Part 2) (00:22:50) Stair-stepping benchmarks (00:26:17) ARC Prize (00:28:34) The rules of the ARC Prize (00:31:12) Compute costs (Part 1) (00:34:47) Sponsors: Omneky | Squad (00:36:34) Compute costs (Part 2) (00:36:40) Compute Limit (00:41:00) Public Leaderboard (00:42:58) The current AI ecosystem (00:47:23) The four steps of solving a puzzle (00:51:20) Intuition (00:54:32) Human Intelligence (00:56:06) Current Frontier Language Models (00:57:44) Program Synthesis (01:04:10) Is the model learning or memorizing? (01:09:51) Improving the ARC dataset (01:11:34) Step 3: Guessing the Rule (01:12:51) Dealing with Ambiguity (01:15:02) Exploring Solutions (01:17:02) Non-backpropagation evolutionary architecture search (01:19:49) Expectations for an AGI world (01:24:11) Reliability and out of domain generalization (01:28:35) What a person would do (01:29:51) What is the right generalization (01:35:32) The ARC AGI Challenge (01:37:01) Postscript (01:38:07) DSpi (01:39:55) Statespace models (01:43:28) Hybrid models (01:48:32) FunSearch (01:50:41) Kolmogorov-Arnold-Networks (01:54:18) Grokking (01:55:42) Outro

spotify head ai co founders expectations improving intuition brave definition chapters prizes oracle squad hybrid efficiency arc api reliability agi guessing zapier ambiguity compute postscript stair human intelligence oci turpentine omneky grokking mike knoop

AI Scouting Report: NOT Investment Advice Edition

Play Episode Listen Later Jun 1, 2024 64:24

Tune in to this special episode of the Cognitive Revolution for AI scouting report. We cover the state of the art in AI applications for medicine, key AI concepts, and how these technologies could influence society. SPONSORS: Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/ RECOMMENDED PODCAST - The Riff with Byrne Hobart Byrne Hobart, the writer of The Diff, is revered in Silicon Valley. You can get an hour with him each week. See for yourself how his thinking can upgrade yours. Spotify: https://open.spotify.com/show/6rANlV54GCARLgMOtpkzKt Apple: https://podcasts.apple.com/us/podcast/the-riff-with-byrne-hobart-and-erik-torenberg/id1716646486 CHAPTERS: (00:00:00) Introduction (00:06:13) Current State of the Art (00:08:34) Medicine (00:13:49) AI limitations (00:17:01) AI capabilities (00:17:02) Sponsors: Oracle | Brave (00:19:09) How AI works (00:24:05) Information processing (00:28:51) Curated data sets (00:32:52) Sponsors: Squad | Omneky (00:34:38) Transformer (00:37:11) Scaling (00:39:59) Emergence (00:44:03) Grokking (00:50:30) Best practices for business (00:53:38) Live Players (00:58:33) What to Watch (01:00:35) Final Thoughts

Kodsnack 570 - Debug your ideas, with Eric Normand

Kodsnack

Play Episode Listen Later Feb 20, 2024 40:57

Fredrik is joined by Eric Normand for a discussion of debugging your ideas through domain modeling, using Eric's concept of lenses to find more good questions to ask. Eric is writing a book about domain modeling and has developed the concept of lenses - ways to look at various aspects of your domain, model, and code in order to better consider various solutions and questions. Why? Because design is needed, but is easily lost in the modern urge to be fast and agile. There's a lot you can and need do on the way to a working system. Eric pushes for design which is an integral part, perferably right in the code, rather than a separate one which can become outdated and separated without anyone noticing. Just spend a little more time on it. Tricks for seeing your domain with fresher eyes. Change is not always maximal and unpredictable! But thinking it is can lead to a lot of indirection and abstraction where a single if-statement could have sufficed for years. Refactoring as a way of finding the seams in your model. What is the code actually supposed to do? How does it actually fit with the domain? Recorded during Øredev 2023, where Eric gave two presentations about the topics discussed: Better software design with domain modeling and Stratified design and functional architecture. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlundand @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Eric Eric's Øredev 2023 presentations: Better software design with domain modeling *Stratified design and functional architecture Eric has his own podcast Grokking simplicity - Eric's book on functional programming Domain modeling Waterfall UML Clojure REPL - Red-evaluate-print loop Kodsnack 294 - the episode where Dan Lebrero gave Fredrik a feel for REPL-driven development Domain modeling lenses Drawing on the right side of the brain The “keynote yesterday” - Na'Tosha Bard about code outliving you (see also episode 558) Then a miracle occurs Titles I'm really on to something Anti-design trend In a waterfall world On the way to code Experimentation in code Not about moving your hand I don't want rules Yes, that's the right question! Take five minutes Spending more time on it Code lets me play with ideas I'm happy working on a whiteboard Debug your ideas Server babysitters

change ideas drawing code spending tricks ko server domain experimentation fredrik vps refactoring debug repl stratified grokking eric normand kodsnack cloudnet

Kodsnack 570 - Debug your ideas, with Eric Normand

Kodsnack in English

Play Episode Listen Later Feb 20, 2024 40:56

Fredrik is joined by Eric Normand for a discussion of debugging your ideas through domain modeling, using Eric’s concept of lenses to find more good questions to ask. Eric is writing a book about domain modeling and has developed the concept of lenses - ways to look at various aspects of your domain, model, and code in order to better consider various solutions and questions. Why? Because design is needed, but is easily lost in the modern urge to be fast and agile. There’s a lot you can and need do on the way to a working system. Eric pushes for design which is an integral part, perferably right in the code, rather than a separate one which can become outdated and separated without anyone noticing. Just spend a little more time on it. Tricks for seeing your domain with fresher eyes. Change is not always maximal and unpredictable! But thinking it is can lead to a lot of indirection and abstraction where a single if-statement could have sufficed for years. Refactoring as a way of finding the seams in your model. What is the code actually supposed to do? How does it actually fit with the domain? Recorded during Øredev 2023, where Eric gave two presentations about the topics discussed: Better software design with domain modeling and Stratified design and functional architecture. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Eric Eric’s Øredev 2023 presentations: Better software design with domain modeling *Stratified design and functional architecture Eric has his own podcast Grokking simplicity - Eric’s book on functional programming Domain modeling Waterfall UML Clojure REPL - Red-evaluate-print loop Kodsnack 294 - the episode where Dan Lebrero gave Fredrik a feel for REPL-driven development Domain modeling lenses Drawing on the right side of the brain The “keynote yesterday” - Na’Tosha Bard about code outliving you (see also episode 558) Then a miracle occurs Titles I’m really on to something Anti-design trend In a waterfall world On the way to code Experimentation in code Not about moving your hand I don’t want rules Yes, that’s the right question! Take five minutes Spending more time on it Code lets me play with ideas I’m happy working on a whiteboard Debug your ideas Server babysitters

change ideas drawing virtual code spending tricks ko titles server domain experimentation fredrik waterfall vps refactoring debug uml clojure repl stratified grokking eric normand kodsnack cloudnet

ART GISER - GROKKING - Energetic NLP

Tell Me Your Story

Play Episode Listen Later Nov 22, 2023 81:33

Art's unique background: the creator of Energetic NLP, NLP trainer, medical researcher, healer, intuitive, leadership and team specialist for major corporations; 4 decades studying psychic development, plus energetic and spiritual healing. Art has been blessed to receive two empowerments directly from the Dalai Lama of Tibet (including one to increase his ability to heal others). Art Has Been Fetaured On: The Love Code, the award-winning Dare to Dream podcast, Happy Insights, Leading Visionaries, Sense of Soul, One World in a New World, Zen Success, The Health & Happiness show, Awakenings, Remember Who You Are, Science & Spirituality Podcast, Change It Up Radio, Superpower Experts, JeffMara Podcast, Ghostman Radio, The Prosperity show, Influencers Channel, 360 Wisdom Speaks, Sunday Soul, the Self-Help Coaching podcast, Lisa Davies, Adam Markel, Loretta Brown Show, Laugh Box, Prosperity Place, Bringing Inspiration To Earth Radio show, Remember Who You Are, and Spiritually Inspired. He's been featured on hundreds of radio, television, podcast shows and in magazines. Art will be featured in an upcoming proninent documentary film, to be released next year. Art will promote the interview to his following. Art is a master at opening up people's abilities: — Learn to use universal energies to transform lives. — Read people's energy fields through the clair's and psychic abilities — Facilitate remote energy work and work with anyone — Accelerating Personal/Spiritual Growth while Activating Your Miraculous Abilities! Art's Talking Points: 1) Reprogramming Your Chakras to Thrive in Turbulent Times 2) Confidently be guided by your Inner Wisdom and Spirit 3) 3 Energy Techniques You Need to Know · See Art Giser's bio, interview questions, talking points, and free gift below. One of Art's areas of mastery is remote energy work. Your listeners will receive transformative energies during the podcast, even if it is a recording! BIO: Meet Art Giser, a visionary powerhouse with a lifetime of diverse experiences that have shaped him into a true trailblazer. With an impressive background spanning 39 years, Art has immersed himself in the realms of NLP, scientific exploration, psychic development, and the profound art of energetic and spiritual healing. His journey has been nothing short of extraordinary. In his earlier years, Art's insatiable curiosity led him to a role as a research lab manager at the esteemed University of California, San Francisco Medical School for eleven years. This foundational experience instilled in him a deep appreciation for the intricacies of human potential and sparked a lifelong passion for understanding the unseen dimensions that shape our reality. As an executive coach, Art has become an invaluable asset to some of the world's largest corporations. His unique approach empowers leaders and fosters the creation of highly collaborative and high-performing teams. Drawing from his rich tapestry of knowledge, he has crafted an innovative coaching style that transcends the boundaries of conventional methods. This has earned him a reputation as a sought-after mentor for those seeking transformative growth and profound shifts in both their personal and professional lives. Some of his clients include Sony, HP, Intel, Pfizer, and Merril Lynch.

university california health art science spirit soul happiness drawing thrive sony dare sense prosperity intel new world pfizer nlp hp dalai lama tibet turbulent times awakenings one world talking points facilitate inner wisdom remember who you are spirituality podcast adam markel art giser sunday soul energetic nlp grokking prosperity place merril lynch

EP09: Grokking Streams and Consumers

NATS.fm

Play Episode Listen Later Nov 1, 2023 81:33

NATS provides incredibly powerful usage patterns and one of the most fundamental is streams and consumers. This pairing is a super power and can solve many challenges in the most simplest way. We cover the bare fundamentals of NATS streams and how to consume data from them with some model use cases to help the exploration. Want to feel more confident about how to embrace data streams? Take a listen as we go back to basics. --- If you like this video, be sure to hit like and subscribe! Want to suggest a topic? Find us on Twitter (https://twitter.com/nats_io), LinkedIn (https://www.linkedin.com/showcase/nat..., or Slack (https://slack.nats.io)! Here are some useful links for NATS! - Website - https://nats.io - Examples - https://natsbyexample.com - Docs - https://docs.nats.io The home of the NATS team: https://synadia.com

slack consumers streams docs nats grokking

Explaining Grokking Through Circuit Efficiency

Deep Papers

Play Episode Listen Later Oct 17, 2023 36:12

Join Arize Co-Founder & CEO Jason Lopatecki, and ML Solutions Engineer, Sally-Ann DeLucia, as they discuss “Explaining Grokking Through Circuit Efficiency." This paper explores novel predictions about grokking, providing significant evidence in favor of its explanation. Most strikingly, the research conducted in this paper demonstrates two novel and surprising behaviors: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalization to partial rather than perfect test accuracy.Find the transcript and more here: https://arize.com/blog/explaining-grokking-through-circuit-efficiency-paper-reading/To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

explaining efficiency circuit ml grokking

Neel Nanda on mechanistic interpretability, superposition and grokking

The Inside View

Play Episode Listen Later Sep 21, 2023 124:53

Neel Nanda is a researcher at Google DeepMind working on mechanistic interpretability. He is also known for his YouTube channel where he explains what is going on inside of neural networks to a large audience. In this conversation, we discuss what is mechanistic interpretability, how Neel got into it, his research methodology, his advice for people who want to get started, but also papers around superposition, toy models of universality and grokking, among other things. Youtube: https://youtu.be/cVBGjhN4-1g Transcript: https://theinsideview.ai/neel OUTLINE (00:00) Intro (00:57) Why Neel Started Doing Walkthroughs Of Papers On Youtube (07:59) Induction Heads, Or Why Nanda Comes After Neel (12:19) Detecting Induction Heads In Basically Every Model (14:35) How Neel Got Into Mechanistic Interpretability (16:22) Neel's Journey Into Alignment (22:09) Enjoying Mechanistic Interpretability And Being Good At It Are The Main Multipliers (24:49) How Is AI Alignment Work At DeepMind? (25:46) Scalable Oversight (28:30) Most Ambitious Degree Of Interpretability With Current Transformer Architectures (31:05) To Understand Neel's Methodology, Watch The Research Walkthroughs (32:23) Three Modes Of Research: Confirming, Red Teaming And Gaining Surface Area (34:58) You Can Be Both Hypothesis Driven And Capable Of Being Surprised (36:51) You Need To Be Able To Generate Multiple Hypothesis Before Getting Started (37:55) All the theory is bullshit without empirical evidence and it's overall dignified to make the mechanistic interpretability bet (40:11) Mechanistic interpretability is alien neuroscience for truth seeking biologists in a world of math (42:12) Actually, Othello-GPT Has A Linear Emergent World Representation (45:08) You Need To Use Simple Probes That Don't Do Any Computation To Prove The Model Actually Knows Something (47:29) The Mechanistic Interpretability Researcher Mindset (49:49) The Algorithms Learned By Models Might Or Might Not Be Universal (51:49) On The Importance Of Being Truth Seeking And Skeptical (54:18) The Linear Representation Hypothesis: Linear Representations Are The Right Abstractions (00:57:26) Superposition Is How Models Compress Information (01:00:15) The Polysemanticity Problem: Neurons Are Not Meaningful (01:05:42) Superposition and Interference are at the Frontier of the Field of Mechanistic Interpretability (01:07:33) Finding Neurons in a Haystack: Superposition Through De-Tokenization And Compound Word Detectors (01:09:03) Not Being Able to Be Both Blood Pressure and Social Security Number at the Same Time Is Prime Real Estate for Superposition (01:15:02) The Two Differences Of Superposition: Computational And Representational (01:18:07) Toy Models Of Superposition (01:25:39) How Mentoring Nine People at Once Through SERI MATS Helped Neel's Research (01:31:25) The Backstory Behind Toy Models of Universality (01:35:19) From Modular Addition To Permutation Groups (01:38:52) The Model Needs To Learn Modular Addition On A Finite Number Of Token Inputs (01:41:54) Why Is The Paper Called Toy Model Of Universality (01:46:16) Progress Measures For Grokking Via Mechanistic Interpretability, Circuit Formation (01:52:45) Getting Started In Mechanistic Interpretability And Which WalkthroughS To Start With (01:56:15) Why Does Mechanistic Interpretability Matter From an Alignment Perspective (01:58:41) How Detection Deception With Mechanistic Interpretability Compares to Collin Burns' Work (02:01:20) Final Words From Neel

work research field frontier methodology interference neel google deepmind universality social security numbers superposition mechanistic interpretability neel nanda grokking

ART GISER - GROKKING - Energetic NLP

Tell Me Your Story

Play Episode Listen Later Sep 15, 2023 81:41

www.BlockBuster7.com www.ChangeTheWorldFromYourLivingRoom.com Art's unique background: the creator of Energetic NLP, NLP trainer, medical researcher, healer, intuitive, leadership and team specialist for major corporations; 4 decades studying psychic development, plus energetic and spiritual healing. Art has been blessed to receive two empowerments directly from the Dalai Lama of Tibet (including one to increase his ability to heal others). Art Has Been Fetaured On: The Love Code, the award-winning Dare to Dream podcast, Happy Insights, Leading Visionaries, Sense of Soul, One World in a New World, Zen Success, The Health & Happiness show, Awakenings, Remember Who You Are, Science & Spirituality Podcast, Change It Up Radio, Superpower Experts, JeffMara Podcast, Ghostman Radio, The Prosperity show, Influencers Channel, 360 Wisdom Speaks, Sunday Soul, the Self-Help Coaching podcast, Lisa Davies, Adam Markel, Loretta Brown Show, Laugh Box, Prosperity Place, Bringing Inspiration To Earth Radio show, Remember Who You Are, and Spiritually Inspired. He's been featured on hundreds of radio, television, podcast shows and in magazines. Art will be featured in an upcoming proninent documentary film, to be released next year. To Book Art on your program, email: · Debbi: debontheradio@gmail.com · Art: Art@EnergeticNLP.com Art will promote the interview to his following. Art is a master at opening up people's abilities: — Learn to use universal energies to transform lives. — Read people's energy fields through the clair's and psychic abilities — Facilitate remote energy work and work with anyone — Accelerating Personal/Spiritual Growth while Activating Your Miraculous Abilities! Art's Talking Points: 1) Reprogramming Your Chakras to Thrive in Turbulent Times 2) Confidently be guided by your Inner Wisdom and Spirit 3) 3 Energy Techniques You Need to Know · See Art Giser's bio, interview questions, talking points, and free gift below. One of Art's areas of mastery is remote energy work. Your listeners will receive transformative energies during the podcast, even if it is a recording! BIO: Meet Art Giser, a visionary powerhouse with a lifetime of diverse experiences that have shaped him into a true trailblazer. With an impressive background spanning 39 years, Art has immersed himself in the realms of NLP, scientific exploration, psychic development, and the profound art of energetic and spiritual healing. His journey has been nothing short of extraordinary. In his earlier years, Art's insatiable curiosity led him to a role as a research lab manager at the esteemed University of California, San Francisco Medical School for eleven years. This foundational experience instilled in him a deep appreciation for the intricacies of human potential and sparked a lifelong passion for understanding the unseen dimensions that shape our reality.

university california health art science soul happiness thrive dare sense prosperity new world nlp dalai lama tibet awakenings one world talking points facilitate inner wisdom remember who you are spirituality podcast adam markel art giser sunday soul energetic nlp grokking prosperity place

EP 87: AI Safety Landscape

Conversations On India

Play Episode Listen Later Sep 11, 2023 33:34

The podcast episode features a discussion between the host and Ishan Sharma, an AI practitioner, on various facets of artificial intelligence, particularly focusing on AI safety. The episode covers the following key points: AI Safety Debate: Ishan explains the concept of "Grokking" — an AI's deep understanding of data — and suggests that it contributes to mistrust in AI systems. He outlines two main camps in the AI safety debate: AI accelerationists, who downplay risks and advocate for rapid progress, and AI doomers, who emphasize the potentially catastrophic risks of AI. Primary Concerns: Ishan mentions three major concerns regarding AI safety: unpredictable emergence of capabilities, alignment with human values, and risks from deceptive AI. Safety Interventions: Various safety measures are proposed, ranging from extreme actions like pausing AI development to more moderate ones like better governance and oversight. Current Limitations: Ishan points out that current AI systems, like transformer architectures, are nearing their peak performance and that future advancements might require experiential learning akin to human experiences. Recorded Sept 10th, 2023. Other ways to connect Follow us on X and Instagram Follow Shubham on X Follow Ishan on X

ai safety landscape grokking

AF - Explaining grokking through circuit efficiency by Vikrant Varma

The Nonlinear Library

Play Episode Listen Later Sep 8, 2023 4:56

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Explaining grokking through circuit efficiency, published by Vikrant Varma on September 8, 2023 on The AI Alignment Forum. This is a linkpost for our paper ''Explaining grokking through circuit efficiency", which provides a general theory explaining when and why grokking (aka delayed generalisation) occurs, and makes several interesting and novel predictions which we experimentally confirm (introduction copied below). You might also enjoy our explainer on X/Twitter. Abstract One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do not, suggesting there is a critical dataset size at which memorisation and generalisation are equally efficient. We make and confirm four novel predictions about grokking, providing significant evidence in favour of our explanation. Most strikingly, we demonstrate two novel and surprising behaviours: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalisation to partial rather than perfect test accuracy. Introduction When training a neural network, we expect that once training loss converges to a low value, the network will no longer change much. Power et al. (2021) discovered a phenomenon dubbed grokking that drastically violates this expectation. The network first ''memorises'' the data, achieving low and stable training loss with poor generalisation, but with further training transitions to perfect generalisation. We are left with the question: why does the network's test performance improve dramatically upon continued training, having already achieved nearly perfect training performance? Recent answers to this question vary widely, including the difficulty of representation learning (Liu et al., 2022), the scale of parameters at initialisation (Liu et al., 2023), spikes in loss (''slingshots") (Thilak et al., 2022), random walks among optimal solutions (Millidge et al., 2022), and the simplicity of the generalising solution (Nanda et al., 2023, Appendix E). In this paper, we argue that the last explanation is correct, by stating a specific theory in this genre, deriving novel predictions from the theory, and confirming the predictions empirically. We analyse the interplay between the internal mechanisms that the neural network uses to calculate the outputs, which we loosely call ''circuits'' (Olah et al., 2020). We hypothesise that there are two families of circuits that both achieve good training performance: one which generalises well (Cgen) and one which memorises the training dataset (Cmem). The key insight is that when there are multiple circuits that achieve strong training performance, weight decay prefers circuits with high ''efficiency'', that is, circuits that require less parameter norm to produce a given logit value. Efficiency answers our question above: if Cgen is more efficient than Cmem, gradient descent can reduce nearly perfect training loss even further by strengthening Cgen while weakening Cmem, which then leads to a transition in test performance. With this understanding, we demonstrate in Section 3 that three key properties are sufficient for grokking: (1) Cgen generalises well while Cmem does not, (2) Cgen is more efficient than Cmem, and (3) Cgen is learned more slowly than Cmem. Since Cgen generalises well, it automatically works ...

power speech explaining efficiency ea circuit liu nanda varma vikrant rationalist grokking

Grokking effective population size, with Robin Waples

Heredity Podcast

Play Episode Listen Later Aug 30, 2023 34:39

What is effective population size (Ne), and why is it important? Robin Waples takes us back to the basics of this important evolutionary concept and discusses his new paper, using simulations to demonstrate that Bill Hill's 1972 equation for calculating Ne still works for populations with extreme reproductive patterns. Hosted on Acast. See acast.com/privacy for more information.

acast population bill hill grokking

E57: Google's PaLM 2 with Paige Bailey

Play Episode Listen Later Aug 29, 2023 56:46

In this episode, Nathan sits down with Paige Bailey, Lead Product Manager of Generative Models at Google Deepmind. In this conversation, they discuss what it's like to be a PM for an LLM as opposed to an app, defining ideal LLM behaviour, and reasoning - how do you distinguish real abilities vs pattern matching? If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics TIMESTAMPS: (00:00) Episode Preview (00:01:15) Introducing Paige Bailey (00:04:21) Paige's background at Google Brain and the Deepmind merger (00:07:00) PM for a LLM vs being a PM for an app (00:11:21) The development timeline and compute budget of PaLM-2 (00:14:30) Paige's role in the PaLM 2 project (00:15:30) Sponsors: Netsuite | Omneky (00:17:26) Defining desired capabilities for PaLM-2 (00:19:17) The amount of work that went into elevating PaLM 2 from PaLM 1 (00:20:28) Has Google lost its ability to ship? (00:24:240) Paige's "eureka" moment seeing GitHub Copilot capabilities (00:27:47) Competing PaLM 2 with other models (00:32:20) Grokking and the predictability of emergent capabilities (00:37:30) Citizen scientists and the multilingual capabilities of PaLM 2 (00:39:29) Distinguishing real reasoning vs pattern matching (00:45:51) Products using PaLM-2 that people should try (00:50:35) Most exciting AI projects that you can try out (00:52:29) Curriculum learning and successor to the transformer LINKS: PaLM 2 Duet AI for developers Avenging Polayni's Revenge X/TWITTER: @DynamicWebPaige (Paige) @labenz (Nathan) @eriktorenberg @CogRev_Podcast SPONSORS: NetSuite | Omneky NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off. Music Credit: GoogleLM

netflix ai google defining products citizens ecommerce oracle ipo curriculum palm kpi llm erp distinguishing deepmind netsuite google deepmind github copilot google brain patty mccord lead product manager omneky generative models grokking paige bailey

AF - An interactive introduction to grokking and mechanistic interpretability by Adam Pearce

The Nonlinear Library

Play Episode Listen Later Aug 7, 2023 2:15

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An interactive introduction to grokking and mechanistic interpretability, published by Adam Pearce on August 7, 2023 on The AI Alignment Forum. Our write up largely agrees with @Quintin Pope's summary, with the addition of training trajectory visualizations and an explanation of the MLP construction that solves modular addition. A meta note that didn't make it into the article - with so many people looking into this problem over the last 18 months, I'm surprised this construction took so long to find. The modular addition task with a 1-layer MLP is about as simple as you can get! Scaling mechanistic interpretability up to more complex tasks/models seems worth continuing to try, but I'm less sure extracting crisp explanations will be possible. Even if we "solve" superposition, figuring the construction here - where there's no superposition in the generalizing model - wasn't trivial. gif/twitter summary If we train a MLP to solve modular addition, the generalizing phase has suggestive periodic patterns. To figure out why the model generalizes, we first look at task where we know the generalizing solution - sparse parity. You can see the model generalizing as weight decay prunes spurious connections. One point from the Omnigrok paper I hadn't internalized before training lots of models: grokking only happens when hyper-parameters are just right. We can make other weird things happen too, like AdamW oscillating between low train loss and low weights. To understand how a MLP solves modular addition, we train a much smaller model with a circular input embedding baked in. Following @Neel Nanda and applying a discrete Fourier transform, we see larger models trained from scratch use the same star trick! Finally, we show what the stars are doing and prove that they work: Our ReLU activation has a small error, but it's close enough to the exact solution - an x² activation suggested in Grokking modular arithmetic - for the model to patch everything up w/ constructive interference. And there are still open question: why are the frequencies with >5 neurons lopsided? Why does factoring Winput not do that same thing as factoring Woutput? Also see The Hydra Effect: Emergent Self-repair in Language Model Computations Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech scaling ea interactive mlp fourier adam pearce mechanistic rationalist interpretability grokking

LW - Visible loss landscape basins don't correspond to distinct algorithms by Mikhail Samin

The Nonlinear Library

Play Episode Listen Later Jul 28, 2023 6:03

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Visible loss landscape basins don't correspond to distinct algorithms, published by Mikhail Samin on July 28, 2023 on LessWrong. Thanks to Justis, Arthur Conmy, Neel Nanda, Joseph Miller, and Tilman Räuker for their feedback on a draft. I feel like many people haven't noticed an important result of mechanistic interpretability analysis of grokking, and so haven't updated how they think about loss landscapes and algorithms that neural networks end up implementing. I think this has implications for alignment research. When thinking about grokking, people often imagine something like this: the neural network implements Algorithm 1 (e.g., memorizes the training data), achieves ~ the lowest loss available via memorization, then moves around the bottom of the Algorithm 1 basin and after a while, stumbles across a path to Algorithm 2 (e.g., the general algorithm for modular addition). But the mechanistic interpretability of grokking analysis has shown that this is not true! Approximately from the start of the training, Algorithm 1 is most of what the circuits are doing and what almost entirely determines the neural network's output; but at the same time, the entire time the neural network's parameters visibly move down the wider basin, they don't just become better at memorization; they increasingly implement the circuits for Algorithm 1 and the circuits for Algorithm 2, in superposition. (Neel Nanda et al. have shown that the circuits that at the end implement the general algorithm for modular addition start forming approximately at the start of the training: the gradient was mostly an arrow towards memorization, but also, immediately from the initialization of the weights, a bit of an arrow pointing towards the general algorithm. The circuits were gradually tuned throughout the training. The noticeable change in the test loss starts occurring when the circuits are already almost right.) A path through the loss landscape visible in 3D doesn't correspond to how and what the neural network is actually learning. Almost all of the changes to the loss are due to the increasingly good implementation of Algorithm 1; but apparently, the entire time, the gradient also points towards some faraway implementation of Algorithm 2. Somehow, the direction in which Algorithm 2 lies is also visible to the derivative, and moving the parameters in the direction the gradient points means mostly increasingly implementing Algorithm 1, and also increasingly implementing the faraway Algorithm 2. "Grokking", visible in the test loss, is due to the change that happens when the parameters already implement Algorithm 2 accurately enough for the switch from mostly outputting the results of an implementation of Algorithm 1 to the results of an improving implementation of Algorithm 2 not to hurt the performance. Once it's the case, the neural network puts more weight into Algorithm 2 and at the same time quickly tunes it to be even more accurate (which is increasingly easy as the output is increasingly determined by the implementation of Algorithm 2). This is something many people seem to have missed. I did not expect it to be the case, was surprised, and updated how I think about loss landscapes. Does this generalize? Maybe. I'm not sure whether it's correct to generalize from the mechanistic interpretability of grokking analysis to neural networks in general, real LLMs are under-parametrised while the grokking model is very over-parameterised, but I guess it might be reasonable to expect that this is how deep learning generally works. People seem to think that multi-dimensional loss landscapes of neural networks have basins for specific algorithms, and neural networks get into these depending on how relatively large these basins are, which might be caused by how simple the algorithms are, how path-depe...

loss 3d speech landscape algorithms ea visible approximately distinct correspond justis samin rationalist basins neel nanda lesswrong joseph miller grokking

LW - Visible loss landscape basins don't correspond to distinct algorithms by Mikhail Samin

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 28, 2023 6:03

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Visible loss landscape basins don't correspond to distinct algorithms, published by Mikhail Samin on July 28, 2023 on LessWrong. Thanks to Justis, Arthur Conmy, Neel Nanda, Joseph Miller, and Tilman Räuker for their feedback on a draft. I feel like many people haven't noticed an important result of mechanistic interpretability analysis of grokking, and so haven't updated how they think about loss landscapes and algorithms that neural networks end up implementing. I think this has implications for alignment research. When thinking about grokking, people often imagine something like this: the neural network implements Algorithm 1 (e.g., memorizes the training data), achieves ~ the lowest loss available via memorization, then moves around the bottom of the Algorithm 1 basin and after a while, stumbles across a path to Algorithm 2 (e.g., the general algorithm for modular addition). But the mechanistic interpretability of grokking analysis has shown that this is not true! Approximately from the start of the training, Algorithm 1 is most of what the circuits are doing and what almost entirely determines the neural network's output; but at the same time, the entire time the neural network's parameters visibly move down the wider basin, they don't just become better at memorization; they increasingly implement the circuits for Algorithm 1 and the circuits for Algorithm 2, in superposition. (Neel Nanda et al. have shown that the circuits that at the end implement the general algorithm for modular addition start forming approximately at the start of the training: the gradient was mostly an arrow towards memorization, but also, immediately from the initialization of the weights, a bit of an arrow pointing towards the general algorithm. The circuits were gradually tuned throughout the training. The noticeable change in the test loss starts occurring when the circuits are already almost right.) A path through the loss landscape visible in 3D doesn't correspond to how and what the neural network is actually learning. Almost all of the changes to the loss are due to the increasingly good implementation of Algorithm 1; but apparently, the entire time, the gradient also points towards some faraway implementation of Algorithm 2. Somehow, the direction in which Algorithm 2 lies is also visible to the derivative, and moving the parameters in the direction the gradient points means mostly increasingly implementing Algorithm 1, and also increasingly implementing the faraway Algorithm 2. "Grokking", visible in the test loss, is due to the change that happens when the parameters already implement Algorithm 2 accurately enough for the switch from mostly outputting the results of an implementation of Algorithm 1 to the results of an improving implementation of Algorithm 2 not to hurt the performance. Once it's the case, the neural network puts more weight into Algorithm 2 and at the same time quickly tunes it to be even more accurate (which is increasingly easy as the output is increasingly determined by the implementation of Algorithm 2). This is something many people seem to have missed. I did not expect it to be the case, was surprised, and updated how I think about loss landscapes. Does this generalize? Maybe. I'm not sure whether it's correct to generalize from the mechanistic interpretability of grokking analysis to neural networks in general, real LLMs are under-parametrised while the grokking model is very over-parameterised, but I guess it might be reasonable to expect that this is how deep learning generally works. People seem to think that multi-dimensional loss landscapes of neural networks have basins for specific algorithms, and neural networks get into these depending on how relatively large these basins are, which might be caused by how simple the algorithms are, how path-depe...

loss 3d speech landscape algorithms ea visible approximately distinct correspond justis samin rationalist basins neel nanda lesswrong joseph miller grokking

AF - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

The Nonlinear Library

Play Episode Listen Later Jul 23, 2023 16:29

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: QAPR 5: grokking is maybe not that big a deal?, published by Quintin Pope on July 23, 2023 on The AI Alignment Forum. [Thanks to support from Cavendish Labs and a Lightspeed grant, .I've been able to restart the Quintin's Alignment Papers Roundup sequence.] Introduction Grokking refers to an observation by Power et al. (below) that models trained on simple modular arithmetic tasks would first overfit to their training data and achieve nearly perfect training loss, but that training well past the point of overfitting would eventually cause the models to generalize to unseen test data. The rest of this post discusses a number of recent papers on grokking. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset. My opinion: When I first read this paper, I was very excited. It seemed like a pared-down / "minimal" example that could let us study the underlying mechanism behind neural network generalization. You can read more of my initial opinion on grokking in the post Hypothesis: gradient descent prefers general circuits. I now think I was way too excited about this paper, that grokking is probably a not-particularly-important optimization artifact, and that grokking is no more connected to the "core" of deep learning generalization than, say, the fact that it's possible for deep learning to generalize from an MNIST training set to the testing set. I also think that using the word "grokking" was anthropomorphizing and potentially misleading (like calling the adaptive information routing component of a transformer model its "attention"). Evocative names risk letting the connotations of the name filter into the analysis of the object being named. E.g., "Grokking" brings connotations of sudden realization, despite the fact that the grokking phase in the above plot starts within the first ~5% - 20% of the training process, though it appears much more abrupt due to the use of a base 10 logarithmic scale on the x-axis. "Grokking" also brings connotations of insight, realization or improvement relative to some previously confused baseline. This leads to the impression that things which grok are better than things which don't. Humans often use the word "grokking" to mean deeply understanding complex domains that actually matter in the real world. Using the same word in an ML context suggests that ML grokking is relevant to whatever mechanisms might let an ML system deeply understand complex domains that actually matter in the real world. I've heard several people say things like: Studying grokking could significantly advance ML capabilities, if doing so were to lead to a deeper understanding of the mechanisms underlying generalization in ML. Training long enough could eventually result in grokking occurring in ML domains of actual relevance, such as language, and thereby lead to sudden capabilities gains or break alignment properties. Grokking is an example of how thinking l...

power training humans speech pope studying ea ml hypothesis lightspeed quintin evocative rationalist grokking mnist

LW - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

The Nonlinear Library

Play Episode Listen Later Jul 23, 2023 16:28

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: QAPR 5: grokking is maybe not that big a deal?, published by Quintin Pope on July 23, 2023 on LessWrong. [Thanks to support from Cavendish Labs and a Lightspeed grant, .I've been able to restart the Quintin's Alignment Papers Roundup sequence.] Introduction Grokking refers to an observation by Power et al. (below) that models trained on simple modular arithmetic tasks would first overfit to their training data and achieve nearly perfect training loss, but that training well past the point of overfitting would eventually cause the models to generalize to unseen test data. The rest of this post discusses a number of recent papers on grokking. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset. My opinion: When I first read this paper, I was very excited. It seemed like a pared-down / "minimal" example that could let us study the underlying mechanism behind neural network generalization. You can read more of my initial opinion on grokking in the post Hypothesis: gradient descent prefers general circuits. I now think I was way too excited about this paper, that grokking is probably a not-particularly-important optimization artifact, and that grokking is no more connected to the "core" of deep learning generalization than, say, the fact that it's possible for deep learning to generalize from an MNIST training set to the testing set. I also think that using the word "grokking" was anthropomorphizing and potentially misleading (like calling the adaptive information routing component of a transformer model its "attention"). Evocative names risk letting the connotations of the name filter into the analysis of the object being named. E.g., "Grokking" brings connotations of sudden realization, despite the fact that the grokking phase in the above plot starts within the first ~5% - 20% of the training process, though it appears much more abrupt due to the use of a base 10 logarithmic scale on the x-axis. "Grokking" also brings connotations of insight, realization or improvement relative to some previously confused baseline. This leads to the impression that things which grok are better than things which don't. Humans often use the word "grokking" to mean deeply understanding complex domains that actually matter in the real world. Using the same word in an ML context suggests that ML grokking is relevant to whatever mechanisms might let an ML system deeply understand complex domains that actually matter in the real world. I've heard several people say things like: Studying grokking could significantly advance ML capabilities, if doing so were to lead to a deeper understanding of the mechanisms underlying generalization in ML. Training long enough could eventually result in grokking occurring in ML domains of actual relevance, such as language, and thereby lead to sudden capabilities gains or break alignment properties. Grokking is an example of how thinking longer about a...

power training humans speech pope studying ea ml hypothesis lightspeed quintin evocative rationalist lesswrong grokking mnist

LW - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 23, 2023 16:28

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: QAPR 5: grokking is maybe not that big a deal?, published by Quintin Pope on July 23, 2023 on LessWrong. [Thanks to support from Cavendish Labs and a Lightspeed grant, .I've been able to restart the Quintin's Alignment Papers Roundup sequence.] Introduction Grokking refers to an observation by Power et al. (below) that models trained on simple modular arithmetic tasks would first overfit to their training data and achieve nearly perfect training loss, but that training well past the point of overfitting would eventually cause the models to generalize to unseen test data. The rest of this post discusses a number of recent papers on grokking. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset. My opinion: When I first read this paper, I was very excited. It seemed like a pared-down / "minimal" example that could let us study the underlying mechanism behind neural network generalization. You can read more of my initial opinion on grokking in the post Hypothesis: gradient descent prefers general circuits. I now think I was way too excited about this paper, that grokking is probably a not-particularly-important optimization artifact, and that grokking is no more connected to the "core" of deep learning generalization than, say, the fact that it's possible for deep learning to generalize from an MNIST training set to the testing set. I also think that using the word "grokking" was anthropomorphizing and potentially misleading (like calling the adaptive information routing component of a transformer model its "attention"). Evocative names risk letting the connotations of the name filter into the analysis of the object being named. E.g., "Grokking" brings connotations of sudden realization, despite the fact that the grokking phase in the above plot starts within the first ~5% - 20% of the training process, though it appears much more abrupt due to the use of a base 10 logarithmic scale on the x-axis. "Grokking" also brings connotations of insight, realization or improvement relative to some previously confused baseline. This leads to the impression that things which grok are better than things which don't. Humans often use the word "grokking" to mean deeply understanding complex domains that actually matter in the real world. Using the same word in an ML context suggests that ML grokking is relevant to whatever mechanisms might let an ML system deeply understand complex domains that actually matter in the real world. I've heard several people say things like: Studying grokking could significantly advance ML capabilities, if doing so were to lead to a deeper understanding of the mechanisms underlying generalization in ML. Training long enough could eventually result in grokking occurring in ML domains of actual relevance, such as language, and thereby lead to sudden capabilities gains or break alignment properties. Grokking is an example of how thinking longer about a...

power training humans speech pope studying ea ml hypothesis lightspeed quintin evocative rationalist lesswrong grokking mnist

AF - Eric Michaud on the Quantization Model of Neural Scaling, Interpretability and Grokking by Michaël Trazzi

The Nonlinear Library

Play Episode Listen Later Jul 12, 2023 3:49

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Eric Michaud on the Quantization Model of Neural Scaling, Interpretability and Grokking, published by Michaël Trazzi on July 12, 2023 on The AI Alignment Forum. Eric is a PhD student in the Department of Physics at MIT working with Max Tegmark on improving our scientific/theoretical understanding of deep learning -- understanding what deep neural networks do internally and why they work so well. We mostly talk about Eric's paper, The Quantization Model of Neural Scaling, but also two papers he recently published on Grokking, Towards Understanding Grokking: an effective theory of representation learning, and Omnigrok: Grokking Beyond Algorithmic Data. Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. On The Quantization Of Neural Scaling "The name of the paper is the quantization model of neural scaling. And the one-tweet summary is that it's possible for smooth loss curves on average to average over lots of small, discrete phase changes in the network performance. What if there were a bunch of things that you need to learn to do prediction well in something language? And so these things could be pieces of knowledge or different abilities to perform certain types of specific computations. We can imagine enumerating this set of things that you need to learn to do prediction well. And we call these the quanta of the prediction problem. And then what if the frequency in natural data that these were useful, each of these quanta, each of these pieces of knowledge or computational ability, what if the frequency that they were useful for prediction followed a power law?" (context) Quantas are the smallest clusters for simple subtasks "In order to predict the new line, has to count line lengths for the previous lines in the document. And then it's able to use that to accurately predict when a new line should be present. And you can find just a large number of clusters where the thing that is common between the clusters just seems to be that it's the same type of problem, or doing prediction on those samples requires the same piece of knowledge. And so you might call these the quanta, or evidence of there being quanta, although it's a little bit tricky, because we, in doing the clustering, enforce this discreteness, where everything is a member of a cluster, a particular cluster, and not another cluster. Anyway, it's complicated and weird. Who knows whether this is even the right model for thinking about the networks." (context) What the existence of quanta would mean for interpretability "It would be very exciting if it was the true model, because it would maybe tell you that there were these set of things where, if you enumerated them, you could understand the network's performance and understood what it has learned. It's just like, ah, there's this set of pieces of knowledge or pieces of computation that are needed. And you could describe what these are. You could find them in the network and maybe hope to mechanistically understand the whole network by decomposing it into how it implements each one of these things, how it learns each piece of knowledge or each piece of computation." How Quantization of Neural Scaling relates to other lines of research like Grokking, or interpretability "With both the quanta scaling stuff and with the grokking stuff, we sort of hope to identify these maybe mechanisms in the model that are responsible for certain behaviors or for the model generalizing. And in the case of grokking, there's sort of multiple circuits or multiple mechanisms that are going on in the model or something where there's a memorizing mechanism and a generalizing mechanism. [...] And maybe just in general beyond ...

spotify phd mit model speech apple podcast google podcasts scaling physics ea micha quantas neural michaud max tegmark rationalist interpretability grokking

Eric Michaud on scaling, grokking and quantum interpretability

The Inside View

Play Episode Listen Later Jul 12, 2023 48:22

Eric is a PhD student in the Department of Physics at MIT working with Max Tegmark on improving our scientific/theoretical understanding of deep learning -- understanding what deep neural networks do internally and why they work so well. This is part of a broader interest in the nature of intelligent systems, which previously led him to work with SETI astronomers, with Stuart Russell's AI alignment group (CHAI), and with Erik Hoel on a project related to integrated information theory. Transcript: https://theinsideview.ai/eric Youtube: https://youtu.be/BtHMIQs_5Nw The Quantization Model of Neural Scaling: https://arxiv.org/abs/2303.13506An Effective Theory of Representation Learning https://arxiv.org/abs/2205.10343 Omnigrok: Grokking Beyond Algorithmic Data: https://arxiv.org/abs/2210.01117

ai phd mit scaling physics quantum chai seti michaud max tegmark stuart russell interpretability grokking

CRC Synod 2023 Aftermath: Classis GRE Begins the Process of Grokking Her New Reality

Paul VanderKlay's Podcast

Play Episode Listen Later Jul 1, 2023 85:33

The Abide Project reported on Classis Grand Rapids East's Town Hall Meeting held at Calvin CRC in Grand Rapids. Paul DeVries afforded them a reprieve at Synod 2023 some are coming to the realization that they can't simply stonewall denominational efforts to resist their decisions with respect to Same Sex Marriage. This is some analysis on Abide's Youtube presentation. @abideproject CRCNA Synod Update 7 - Classis Grand Rapids East https://youtu.be/3FQu7ynbJx8 @crcna Synod 2023 https://www.youtube.com/live/XCbC6hmymC8?feature=share July Washington DC Event with Michael Martin and Spencer Klavan https://www.eventbrite.com/e/christ-and-community-in-the-modern-age-tickets-608591853617 Breakwater Festival Mannheim Germany October 27-29 2023 Event Details and Tickets: https://buytickets.at/breakwater/935800 T-shirts: https://buytickets.at/breakwater/store Discord: tinyurl.com/BreakwaterDiscord Festival Email: contact.breakwater@gmail.com Flyer https://bit.ly/breakwaterfestival2023 Paul Vander Klay clips channel https://www.youtube.com/channel/UCX0jIcadtoxELSwehCh5QTg Bridges of Meaning Discord https://discord.gg/rpf3zatJ https://www.meetup.com/sacramento-estuary/ My Substack https://paulvanderklay.substack.com/ Estuary Hub Link https://www.estuaryhub.com/ If you want to schedule a one-on-one conversation check here. https://paulvanderklay.me/2019/08/06/converzations-with-pvk/ There is a video version of this podcast on YouTube at http://www.youtube.com/paulvanderklay To listen to this on ITunes https://itunes.apple.com/us/podcast/paul-vanderklays-podcast/id1394314333 If you need the RSS feed for your podcast player https://paulvanderklay.podbean.com/feed/ All Amazon links here are part of the Amazon Affiliate Program. Amazon pays me a small commission at no additional cost to you if you buy through one of the product links here. This is is one (free to you) way to support my videos. https://paypal.me/paulvanderklay Blockchain backup on Lbry https://odysee.com/@paulvanderklay https://www.patreon.com/paulvanderklay Paul's Church Content at Living Stones Channel https://www.youtube.com/channel/UCh7bdktIALZ9Nq41oVCvW-A To support Paul's work by supporting his church give here. https://tithe.ly/give?c=2160640

amazon tickets aftermath abide grand rapids new reality same sex marriages synod town hall meetings michael martin lbry spencer klavan my substack grokking

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk

Play Episode Listen Later Jun 18, 2023 250:00

In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers. Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. However, Neel notes we must start small, build rigorous foundations, and not assume our theoretical frameworks perfectly match reality. The conversation turns to whether models can have goals or agency, with Neel arguing they likely can based on heuristics like models executing long term plans towards some objective. However, we currently lack techniques to build models with specific goals, meaning any goals would likely be learned or emergent. Neel highlights how induction heads, circuits models use to track long range dependencies, seem crucial for phenomena like in-context learning to emerge. On the existential risks from AI, Neel believes we should avoid overly confident claims that models will or will not be dangerous, as we do not understand them enough to make confident theoretical assertions. However, models could pose risks through being misused, having undesirable emergent properties, or being imperfectly aligned. Neel argues we must pursue rigorous empirical work to better understand and ensure model safety, avoid "philosophizing" about definitions of intelligence, and focus on ensuring researchers have standards for what it means to decide a system is "safe" before deploying it. Overall, a thoughtful conversation on one of the most important issues of our time. Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk Neel Nanda: https://www.neelnanda.io/ TOC [00:00:00] Introduction and Neel Nanda's Interests (walk and talk) [00:03:15] Mechanistic Interpretability: Reverse Engineering Neural Networks [00:13:23] Discord questions [00:21:16] Main interview kick-off in studio [00:49:26] Grokking and Sudden Generalization [00:53:18] The Debate on Systematicity and Compositionality [01:19:16] How do ML models represent their thoughts [01:25:51] Do Large Language Models Learn World Models? [01:53:36] Superposition and Interference in Language Models [02:43:15] Transformers discussion [02:49:49] Emergence and In-Context Learning [03:20:02] Superintelligence/XRisk discussion Transcript: https://docs.google.com/document/d/1FK1OepdJMrqpFK-_1Q3LQN6QLyLBvBwWW_5z8WrS1RI/edit?usp=sharing Refs: https://docs.google.com/document/d/115dAroX0PzSduKr5F1V4CWggYcqIoSXYBhcxYktCnqY/edit?usp=sharing

ai debate discord transformers emergence ml interests interference deepmind toc neel superposition mechanistic interpretability neel nanda grokking compositionality

Eric Normand - Grokking Simplicity

Maintainable

Play Episode Listen Later Mar 13, 2023 47:22

Robby has a chat with software engineer, trainer, and author of the book Grokking Simplicity, Eric Normand (he/him/his). As Eric reflects on his experience, the first thought he has of well-maintained software is that it's like a unicorn (Something you can't find in real life), but on a more serious note, he emphasizes that smaller code bases are easier to maintain and that the maintainability of a code base is also highly dependent on the team of engineers that are working on it.Robby and Eric will dive into the pros and cons of microservices in small organizations, why teams need to ensure they know where they're going with the future of their codebase, Eric's book - Grokking Simplicity, the differences within a functional programming language, higher-order functions, Eric's journey toward authoring a technical book, and the value of engineers being able to get comfortable moving code around in a project without needing to ask for permission first. Tune in and enjoy!Book Recommendations:Book Recommendation: The Timeless Way of Building by Christopher W. Alexander>Helpful Links:Grokking Simplicity By Eric Normand, ManningEric's WebsiteEric on LinkedInEric on TwitterSubscribe to Maintainable on:Apple PodcastsOvercastSpotifyOr search "Maintainable" wherever you stream your podcasts.Keep up to date with the Maintainable Podcast by joining the newsletter.

technology building tech code software simplicity programming coding software engineers legacy code grokking eric normand

AF - Paper Replication Walkthrough: Reverse-Engineering Modular Addition by Neel Nanda

The Nonlinear Library

Play Episode Listen Later Mar 12, 2023 1:26

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper Replication Walkthrough: Reverse-Engineering Modular Addition, published by Neel Nanda on March 12, 2023 on The AI Alignment Forum. I'm excited about trying different formats for mechanistic interpretability education! I've made a video walkthrough where we replicate my paper, Progress Measures for Grokking via Mechanistic Interpretability. With Jess Smith, one of my co-authors, we record ourselves coding a replication and discussed what we did at each step. This is a three part walkthrough and you can see the accompanying code for the walkthrough here: In part 1, we train a model to perform modular addition, and see that it does grok! In part 2, we take this model and reverse-engineer the trig-based circuit it's learned to do modular addition. We show that you can both read out intermediate steps of the circuit from the activations, and that you can just read off some of the algorithm's steps from the model weights. In part 3, we define some progress measures that let us distinguish progress towards the generalising and the memorising algorithm. We then look at the model during training and watch how the circuits develop, and use this to understand why it groks. This is an experiment with a new format, and I'd love to hear about how useful you find it! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

speech ea addition modular walkthroughs reverse engineering replication rationalist neel nanda grokking

AF - 200 COP in MI: Interpreting Algorithmic Problems by Neel Nanda

The Nonlinear Library

Play Episode Listen Later Dec 31, 2022 12:31

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 200 COP in MI: Interpreting Algorithmic Problems, published by Neel Nanda on December 31, 2022 on The AI Alignment Forum. This is the fourth post in a sequence called 200 Concrete Open Problems in Mechanistic Interpretability. Start here, then read in any order. If you want to learn the basics before you think about open problems, check out my post on getting started. I'll make another post every 1-2 days, giving a new category of open problems. If you want to read ahead, check out the draft sequence here! Motivation Motivating paper: A Mechanistic Interpretability Analysis of Grokking When models are trained on synthetic, algorithmic tasks, they often learn to do some clean, interpretable computation inside. Choosing a suitable task and trying to reverse engineer a model can be a rich area of interesting circuits to interpret! In some sense, this is interpretability on easy mode - the model is normally trained on a single task (unlike language models, which need to learn everything about language!), we know the exact ground truth about the data and optimal solution, and the models are tiny. So why care? I consider my work on grokking to be an interesting case study of this work going well. Grokking (shown below) is a mysterious phenomena where, when small models are trained on algorithmic tasks (eg modular addition or modular division), they initially memorise the training data. But when they keep being trained on that data for a really long time, the model suddenly(ish) figures out how to generalise! In my work, I simplified their setup even further, by training a 1 Layer transformer (with no LayerNorm or biases) to do modular addition and reverse engineered the weights to understand what was going on. And it turned out to be doing a funky trig-based algorithm (shown below), where the numbers are converted to frequencies with a memorised Discrete Fourier Transform, added using trig identities, and converted back to the answer! Using this, we looked inside the model and identified that despite seeming to have plateaued, in the period between memorising and "grokking", the model is actually slowly forming the circuit that does generalise. But so long as the model still has the memorising circuit, this adds too much noise to have good test loss. Grokking occurs when the generalising circuit is so strong that the model decides to "clean-up" the memorising circuit, and "uncovers" the mature generalising circuit beneath, and suddenly gets good test performance. OK, so I just took this as an excuse to explain my paper to you. Why should you care? I think that the general lesson from this, that I'm excited to see applied elsewhere, is using toy algorithmic models to analyse a phenomena we're confused about. Concretely, given a confusing phenomena like grokking, I'd advocate the following strategy: Simplify to the minimal setting that exhibits the phenomena, yet is complex enough to be interesting Reverse-engineer the resulting model, in as much detail as you can Extrapolate the insights you've learned from the reverse-engineered model - what are the broad insights you've learned? What do you expect to generalise? Can you form any automated tests to detect the circuits you've found, or any of their motifs? Verify by looking at other examples of the phenomena and seeing whether these insights actually hold (larger models, different tasks, even just earlier checkpoints of the model or different random seeds) Grokking is an example in a science of deep learning context - trying to uncover mysteries about how models learn and behave. But this same philosophy also applies to understanding confusing phenomena in language models, and building toy algorithmic problems to study those! Anthropic's Toy Models of Superposition is an excellent example of this done well, for the case of ...

speech cops ea simplify layer interpreting anthropic verify algorithmic superposition rationalist neel nanda concretely grokking

Nodesignal-Buchclub - E12 - Grokking Kalle

Nodesignal - Deine Bitcoin-Frequenz

Play Episode Listen Later Dec 14, 2022 47:02

In der heutigen Folge sprechen Jan-Paul, Chris, Martin und Calso mit Kalle Rosenbaum, dem Autor von dem in dieser Staffel gelesenen Buch „Bitcoin begreifen“ oder “Grokking Bitcoin”. Mit diesem englischsprachigen Interview schließen wir den Buchclub für diese Staffel ab. Von und mit Jan-Paul, Chris, Martin, Calso und Kalle Rosenbaum. Hier könnt ihr uns eine Spende über Lightning da lassen: ⚡️ tip@tip.nodesignal.space Wenn euch unsere Arbeit gefällt, könnt ihr unsere Folgen auch auf Podcasting 2.0 Plattformen, wie Breez, Fountain oder Castamatic hören und uns so eine kleine Aufmerksamkeit da lassen. Danke an alle, die die Bitcoin Community mit ihren Spenden unterstützen! Mit diesen Spenden wird unter anderem unser Bounty Programm verwirklicht, in dem ihr euch für die Mitarbeit an einem Projekt eine Belohnung sichern könnt. -------------- Blockzeit: 767260 Nodesignal Buchclubreihe Kalle bei Twitter Kalle auf Mastodon -------------- Für Feedback und weitergehenden Diskussionen kommt gerne in die Telegramgruppe von Nodesignal und bewertet uns bei Spotify und Apple Podcasts. Folgt uns auch gerne bei Twitter oder Instagram.

AF - Mesa-Optimizers via Grokking by orthonormal

The Nonlinear Library

Play Episode Listen Later Dec 6, 2022 10:47

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mesa-Optimizers via Grokking, published by orthonormal on December 6, 2022 on The AI Alignment Forum. Summary: Recent interpretability work on "grokking" suggests a mechanism for a powerful mesa-optimizer to emerge suddenly from a ML model. Inspired By: A Mechanistic Interpretability Analysis of Grokking Overview of Grokking In January 2022, a team from OpenAI posted an article about a phenomenon they dubbed "grokking", where they trained a deep neural network on a mathematical task (e.g. modular division) to the point of overfitting (it performed near-perfectly on the training data but generalized poorly to test data), and then continued training it. After a long time where seemingly nothing changed, suddenly the model began to generalize correctly and perform much better on test data: A team at Anthropic analyzed grokking within large language models and formulated the idea of "induction heads". These are particular circuits (small sub-networks) that emerge over the course of training, which serve clearly generalizable functional roles for in-context learning. In particular, for GPTs multi-layer transformer networks doing text prediction, the model eventually generates circuits which hold on to past tokens from the current context, such that when token A appears, they direct attention to every token that followed A earlier in the context. (To reiterate, this circuit does not start a session with those associations between tokens; it is instead a circuit which learns patterns as it reads the in-context prompt.) The emergence of these induction heads coincides with the drop in test error, which the Anthropic team called a "phase change": Neel Nanda and Tom Lieberum followed this with a post I highly recommend, the aforementioned A Mechanistic Interpretability Analysis of Grokking. They looked more closely at grokking for mathematical problems, and were impressively able to reverse-engineer the post-grokking algorithm: it had cleanly implemented the Discrete Fourier Transform. (To be clear, it is not as if the neural network had abstractly reasoned its way through higher mathematics; it just found the solution with the simplest structure, which is the DFT.) They also gave a fascinating account of what might be happening behind the curtain as a neural network groks a pattern. In short, a network starts out by memorizing the training data rather than finding a general solution, because the former can easily be implemented with one modification at a time, while the latter requires coordinated circuits. However, once the model reaches diminishing returns on memorization, a bit of regularization will encourage it to reinforce simple circuits that cover many cases: And the natural way to do this is by picking up on regularities in the data - eg, you can memorise modular addition twice as efficiently by recognising that x+y=y+x. Once these circuits emerge, gradient descent quickly replaces the memorized solution with them. The training error decreases ever-so-slightly as the regularization penalty gets lower, while the test error plummets because the circuit generalizes there. To make an analogy: What Grokking Feels Like From the Inside You're an AI being trained for astronomy. Your trainers have collected observations of the night sky, for a millennium, from planet surfaces of ten thousand star systems. Over and over, they're picking a system and feeding you the skies one decade at a time and asking you to predict the next decade of skies. They also regularize you a bit (giving you tiny rewards for being a simpler AI). Eventually, you've seen each star system enough times that you've memorized a compressed version of their history, and all you do is identify which system you're in and then replay your memories of it. For instance, within the first few visits to our system, ...

ai speech openai ea mesa ml anthropic gpts optimizer rationalist dft neel nanda grokking

Ep. 123 Grokking KDP's Investment in Athletic Brewing

BeerNet Radio

Play Episode Listen Later Nov 11, 2022 17:12

Today our guest postponed so the gang briefly groks Keurig Dr Pepper's $50m investment in non-alc brewer Athletic Brewing. Note: Video feeds on Spotify and at www.youtube.com/beernetradio About Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests once a week as they grok the industry issues of the day. -Podcast Hotline: Text or call Harry in confidence at:‪ (262) 345-2501‬ -Audio/Video feeds at https://linktr.ee/beernet -Articles referenced at beernet.com -Twitter / IG / TikTok: @beerbizdaily --- Send in a voice message: https://anchor.fm/beernetradio/message

spotify investment audio video athletic brewing keurig dr pepper grokking

What is Grokking?

Experience Points

Play Episode Play 26 sec Highlight Listen Later Oct 30, 2022 20:33

What is Grokking?On today's episode we'll answer the question: “What is Grokking?” You may have heard the word “Grok” before but may not have known what it meant. Grokking is used by different people and in different contexts. Usually, people use the term to convey thoroughly understanding and interaction with a game or system. But, the origins of the word; how it's used; and its application to games and games-based learning is diverse and fascinatingIf you liked this episode please consider commenting, sharing, and subscribing.Subscribing is absolutely free and ensures that you'll get the next episode of Experience Points delivered directly to you.I'd also love it if you took some time to rate the show!I live to lift others with learning. So, if you found this episode useful, consider sharing it with someone who could benefit.Also make sure to visit University XP online at www.universityxp.comUniversity XP is also on Twitter @University_XP and on Facebook and LinkedIn as University XPAlso, feel free to email me anytime at dave@universityxp.comGame on!Get the full transcript and references for this episode here: https://www.universityxp.com/podcast/71Support the show

learning games students players mastery mechanics grok subscribing experience points grokking

Ep. 121 Grokking Boston, Seltzers, RTDs, and Southern

BeerNet Radio

Play Episode Listen Later Oct 29, 2022 20:33

Today the gang groks Boston Beer's recent pivot to lighter-tasting products -- zigging where others zag -- and the nature of malt/sugar seltzers versus spirits-based RTDs and how brands are jockeying for position between the two, and a quick discussion on what the TTB/IRS raid(s) on Southern Glazer's means. Note: Video feeds on Spotify and YouTube at: https://www.youtube.com/watch?v=LGL_lMAGZ5s Beer Summit info: https://beernet.com/beer-summit/ About Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests once a week as they grok the industry issues of the day. -Podcast Hotline: Text or call Harry in confidence at:‪ (262) 345-2501‬ -Audio/Video feeds at https://linktr.ee/beernet -Articles referenced at beernet.com -Twitter / IG / TikTok: @beerbizdaily --- Send in a voice message: https://anchor.fm/beernetradio/message

spotify southern audio video seltzers boston beer rtds southern glazer grokking

Ep. 117 Grokking Mark Anthony Brands' New Innovations

BeerNet Radio

Play Episode Listen Later Oct 8, 2022 13:58

Today Jordan and Harry sit down right after the Mark Anthony Brands 2023 distributor meeting in the lobby of the Sheraton Grand in Chicago to talk through our impressions of their new innovations. Note: The audio is poor, but we've included subtitles in the video. Watch the video on YouTube: https://www.youtube.com/watch?v=LlB0dArHJmA ---About--- Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests once a week as they grok the industry issues of the day. -Podcast Hotline: Text or call Harry in confidence at:‪ (262) 345-2501‬ -Audio/Video feeds at https://linktr.ee/beernet -Articles referenced at beernet.com -Twitter / IG / TikTok: @beerbizdaily --- Send in a voice message: https://anchor.fm/beernetradio/message

chicago innovation brands mark anthony audio video grokking

LW - Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA by Marius Hobbhahn

The Nonlinear Library

Play Episode Listen Later Oct 4, 2022 1:48

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA, published by Marius Hobbhahn on October 4, 2022 on LessWrong. There is a new paper by Liu et al. that claims to have understood the key mechanism underlying grokking (potentially even generalization more broadly). They argue: 1. Grokking can be explained via the norm of the weights. They claim that there is a constant level of the weight norm that is optimal for generalization.2. If there is an optimal level of the weight norm, the weight norm of your model after initialization can be either too low, too high or optimal. They claim that grokking is a phenomenon where we initialize the model with a large weight norm and it then slowly walks toward the optimal weight norm and then generalizes. 3. They also claim that you can get the same results as grokking but much faster if you set the weight norm correctly at every step.4. They set the norm "correctly" by rescaling the weights after each unconstrained optimization step (so after every weight update loss.backward()?!)Implications: I think they have found a very important insight of grokking and finding generalizing circuits more broadly. I'm still a bit skeptical of some of the claims and results. On some level "just fix your weight norm and the model generalizes" sounds too simple to be true for all tasks. I think this result could have big implications but I'm not yet sure whether they are positive or negative. On the one hand, finding generalizing circuits seems to solve some of the problems associated with bad out-of-distribution generalization. On the other hand, it likely speeds up capabilities. I'm very unsure about this paper but intuitively it feels important. Thoughts? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

data speech ea marius liu algorithmic rationalist lesswrong grokking

LW - Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA by Marius Hobbhahn

The Nonlinear Library: LessWrong

Play Episode Listen Later Oct 4, 2022 1:48

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA, published by Marius Hobbhahn on October 4, 2022 on LessWrong. There is a new paper by Liu et al. that claims to have understood the key mechanism underlying grokking (potentially even generalization more broadly). They argue: 1. Grokking can be explained via the norm of the weights. They claim that there is a constant level of the weight norm that is optimal for generalization.2. If there is an optimal level of the weight norm, the weight norm of your model after initialization can be either too low, too high or optimal. They claim that grokking is a phenomenon where we initialize the model with a large weight norm and it then slowly walks toward the optimal weight norm and then generalizes. 3. They also claim that you can get the same results as grokking but much faster if you set the weight norm correctly at every step.4. They set the norm "correctly" by rescaling the weights after each unconstrained optimization step (so after every weight update loss.backward()?!)Implications: I think they have found a very important insight of grokking and finding generalizing circuits more broadly. I'm still a bit skeptical of some of the claims and results. On some level "just fix your weight norm and the model generalizes" sounds too simple to be true for all tasks. I think this result could have big implications but I'm not yet sure whether they are positive or negative. On the one hand, finding generalizing circuits seems to solve some of the problems associated with bad out-of-distribution generalization. On the other hand, it likely speeds up capabilities. I'm very unsure about this paper but intuitively it feels important. Thoughts? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

data speech ea marius liu algorithmic rationalist lesswrong grokking

20220909 Nicole Sandler Show - Grokking TrumpDoc Gate with Teri Kanefield

The Nicole Sandler Show

Play Episode Listen Later Sep 9, 2022 68:44

Yeah, I know that all of the so-called news shows here in the US are all in on the death of the Queen, but she died, she'll remain dead, Charles is the King and that's about all I have to say about that. I could say a few other things, but it would be counter-productive, and we've got more important things to deal with. Like WTF with this so-called judge that the former guy installed who appeased his call for a "special master" to derail the DOJ's investigation into TFG for stealing classified information on his way out of the White House. Today we'll meet a new guest of the show. Teri Kanefield is an author, teacher, appellate defense attorney and someone who's taken the facts of the Stolen Documents caper, organized them and made them easy to read, understand and digest on her blog at https://terikanefield.com and on Twitter (@Teri_Kanefield). Perhaps we'll be able to cut through some of the noise trying to drown out the truth.

white house gate doj tfg grokking nicole sandler

EA - Grokking “Semi-informative priors over AI timelines” by anson.ho

The Nonlinear Library

Play Episode Listen Later Jun 13, 2022 22:24

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Grokking “Semi-informative priors over AI timelines”, published by anson.ho on June 12, 2022 on The Effective Altruism Forum. Notes: I give visual explanations for Tom Davidson's report, Semi-informative priors over AI timelines, and summarise the key assumptions and intuitions The diagrams can be found here – you can click on the boxes to get linked to the part of the report that you're interested in Thanks to the Epoch team for feedback and support! Thanks especially to Jaime Sevilla and Tom Davidson for providing detailed feedback. Executive Summary The framework in Semi-informative priors over AI timelines assumes a model of AGI development which consists of a sequence of Bernoulli trials, i.e. it treats each calendar year as a “trial” at building AGI with constant probability p of succeeding. However, we don't know what this value of p is, so we use a generalisation of Laplace's rule of succession to estimate P(AGI next year | no AGI yet). This is done by specifying a first-trial probability, the probability of successfully building AGI in the first year of AI research, together with the number of virtual successes, which tells us how quickly we should update our estimate for P(AGI next year | no AGI yet) based on evidence. The framework leans very heavily on the first-trial probability, which is determined using a subjective selection of reference classes (more here). How much evidence we get depends on the number of trials that we see, which depends on the regime start-time – you can think of this as the time before which failure to develop AGI doesn't tell us anything useful about the probability of success in later trials. For instance, we might think that 1956 (the year of the Dartmouth Conference) was the first year where people seriously started trying to build AGI, so the absence of AGI before 1956 isn't very informative. If we think of each trial as a calendar year, then there have been 2021-1956 = 65 trials since the regime start-time, and we still haven't developed AGI, so that's 65 failed trials which we use to update P(AGI next year | no AGI yet), where “next year” now corresponds to 2022 rather than 1957. But why should a trial correspond to a calendar year? The answer is that it doesn't have to! In total, Davidson considers three candidate trial definitions: Calendar-year trials: 1 trial = 1 calendar year Compute trials: 1 trial = a 1% increase in the largest amount of compute used to develop an AI system to date Researcher-year trials: 1 trial = a 1% increase in the total researcher-years so far If we extend this reasoning, then we can predict the probability that AGI is built X years into the future. Davidson does this to predict P(AGI by 2036 | no AGI yet) as follows: The idea is that this framework only incorporates a small amount of information based on observational evidence, giving “semi-informative priors” over AI timelines. This framework is shown in more detail below: Since Davidson uses three different trial definitions, we actually get three of these diagrams! All in all, Davidson uses this to get a central estimate of P(AGI by 2036 | no AGI yet)=8%, with the following cumulative probability function: Motivation One way of forecasting AI Timelines is to consider the inner workings of AI, guess what kinds of developments are the most important, and then generate a probability distribution over when Artificial General Intelligence (AGI) will be developed. This is the approach taken by Ajeya Cotra in Forecasting TAI with biological anchors, a really detailed draft report that draws analogy to the human brain to forecast when Transformative AI (TAI) will first be developed. Tom Davidson's report, Semi-informative priors over AI timelines, is also a detailed report forecasting AI timelines, but it takes a different approach to Cotra's repor...

ai speech calendar ea semi davidson timelines agi laplace epoch artificial general intelligence bernoulli rationalist priors grokking ajeya cotra

AF - Grokking “Forecasting TAI with biological anchors” by Anson Ho

The Nonlinear Library

Play Episode Listen Later Jun 7, 2022 22:17

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Grokking “Forecasting TAI with biological anchors”, published by Anson Ho on June 6, 2022 on The AI Alignment Forum. Notes: I give a visual explanation of Ajeya Cotra's draft report, Forecasting TAI with biological anchors (Cotra, 2020), summarising the key assumptions, intuitions, and conclusions The diagrams can be found here – you can click on some of the boxes to get linked to the part of the report that you're interested in Thanks to Michael Aird, Ashwin Acharya, and the Epoch team for suggestions and feedback! Special thanks to Jaime Sevilla and Ajeya Cotra for detailed feedback. Executive Summary Click here to skip the summary Ajeya Cotra's biological anchors framework attempts to forecast the development of Transformative AI (TAI) by treating compute as a key bottleneck to AI progress. This lets us focus on a concrete measure (compute, measured in FLOP) as a proxy for the question “when will TAI be developed?” Given this, we can decompose the question into two main questions: 2020 training compute requirements: How much compute will we need to train TAI, using 2020 Machine Learning architectures and algorithms? Affordability of compute: How likely is it that we'll be able to afford the compute required to train TAI in a particular year? The second question can be tackled by turning to existing trends in three main factors: (1) algorithmic progress e.g. improved algorithmic efficiency, (2) decreasing computation prices e.g. due to hardware improvements, and (3) increased willingness to spend on compute. The first question is significantly trickier. Cotra attempts to answer it by treating the brain as a “proof of concept” that the “amount of compute” used to “train” the brain can train a general intelligence. This lets her relate the question “how much compute will we need to train TAI?” with the question “how much ‘compute' was used to ‘train' the human brain?”. However, there's no obvious single interpretation for the latter question, so Cotra comes up with six hypotheses for what this corresponds to, referring to these hypotheses as “biological anchors” or “bioanchors”: Evolution anchor: Compute to train TAI = Compute performed over evolution since the first neurons Lifetime anchor: Compute to train TAI = Compute performed by the human brain when maturing to an adult (0 to 32 years old) Three neural network anchors: Anchor to the processing power of the human brain, and to empirical parameter scaling laws. Technically there are three of these, corresponding to short, medium, and long “effective horizon lengths” – the amount of data required to determine whether or not a perturbation to the AI system improves or worsens performance Genome anchor: Anchor to the processing power of the human brain, set the number of parameters = number of bytes in the human genome, and extrapolate the amount of training data required using the same empirical scaling laws mentioned above and assuming a long horizon length (one “data point” = multiple years) In calculating the training compute requirements distribution, Cotra places 90% weight collectively across these bioanchor hypotheses, leaving 10% to account for the possibility that all of the anchors significantly underestimate the required compute. Here's a visual representation of how Cotra breaks down the question “How likely is the development is TAI by a given year?”: The above was essentially a summary of Cotra's factorization of the question of AI timelines; for a summary of her key findings, see here. Motivation One of the biggest unresolved debates in AI Safety is the question of AI Timelines – when will Transformative AI (TAI) be developed? In 2020, Ajeya Cotra released a draft report, Forecasting TAI with biological anchors (Cotra, 2020), that aims to answer this question. It's over 200 pages long including the a...

ai evolution anchor speech ea machine learning flop biological technically forecasting affordability anchors tai epoch compute rationalist grokking ajeya cotra

Grokking gRPC - .NET 108

Adventures in .NET

Play Episode Listen Later Feb 22, 2022 34:39

In today's episode Mohamad Lawand helps us try and grok gRPC. gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. That description doesn't really demystify what gRPC can do so Mohamad helps us fill in some of the gaps. We find out how gRPC works in .NET, how it works with HTTP, how it is different from REST and some good use cases for using it in your applications. gRPC is used by companies like Netflix and Square and is supported by the Cloud Native Computing Foundation. Have you used gRPC in your applications? Let us know on Twitter at @dotnet_Podcast. (https://twitter.com/dotNET_Podcast) Sponsors Top End Devs (https://topenddevs.com/) Coaching | Top End Devs (https://topenddevs.com/coaching) Links Getting Started with gRPC and .NET (https://www.infoq.com/articles/getting-started-grpc-dotnet/?topicPageSponsorship=e4f7a6e7-0225-492c-adca-2bdf6f0012f3) gRPC (https://grpc.io/) HTTP/2 protocol (https://caniuse.com/http2) Envoy Proxy (https://www.envoyproxy.io/) Mohamad Lawand (https://mohamadlawand.com/#home) Picks Caleb- Lost Ark - Free to Play MMO Action RPG (https://www.playlostark.com/en-us) Shawn- Reacher (https://www.imdb.com/title/tt9288030/) Wai- Pragmatic Works (https://www.youtube.com/channel/UC5CugyvTdOloiuTc9nN09TA) Special Guest: Mohamad Lawand.

netflix square mohamad grpc cloud native computing foundation grokking coaching top end devs shawn clabough wai liu

Grokking Continuous Delivery Book Review with Christie Wilson

TestTalks | Automation Awesomeness | Helping YOU Succeed with Test Automation

Play Episode Listen Later Feb 20, 2022 33:12

Want a practical guide to implementing and using continuous delivery in your software projects? In this episode, Christie Wilson, a software engineer at Google and author of Grokking Continuous Delivery, shares insight from her soon-to-be-released book. Discover a complete overview of all the pieces of a CD pipeline automation testing best practices and learn how to fit them together for new and legacy codebases. Listen up

google discover cd continuous delivery grokking

Ep. 25 – Churning the Ocean of Milk

Living with Reality with Dr. Robert Svoboda

Play Episode Listen Later Feb 2, 2022 42:38

Dr. Svoboda shares the cosmic and internal lessons to be learned from the layered Indian myth – The Churning of the Ocean of Milk.In this special episode of Living With Reality, Dr. Svoboda—through the blessing of his mentor, the Aghori Vimalananda—invites us into a lively telling of the richly layered Indian myth: The Churning of the Ocean of Milk. Grokking this sweet and useful story from both the external cosmic vantage, as well as the microcosmic internal perspective, Dr. Svoboda lifts the veil to a holographic understanding of this astronomy-steeped foundational Vedic story.Join Dr. Svoboda's Online Courses & Satsangha for teachings on Ayurveda, Kundalini, Sadhana, The Ramayana, & The Greatness of Saturn.Dive into Paula Crossfield's new podcast and ongoing offerings in Astrology, Ayurveda, Business, & more, at WeaveYourBliss.comSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

business dive indian ocean greatness milk astrology saturn ayurveda online courses kundalini vedic ramayana svoboda sadhana churning grokking

Badguys - 40

BadGuys

Play Episode Listen Later Jan 23, 2022 47:38

- Ταξίδια στην εποχή του Covid - Aegean airlines απόψεις και business class - Ληξιαρχικές πράξεις σε προξενεια και e-ληξιαρχειο επιτελους - [Wordle](https://www.powerlanguage.co.uk/wordle/) - Συμβυουλές για system design interview - [Grokking the Advanced System Design Interview](https://www.educative.io/courses/grokking-adv-system-design-intvw) - [Microservices Patters - Book](https://www.amazon.co.uk/Microservice-Patterns-examples-Chris-Richardson/dp/1617294543) - [Monolith to Microservices- Book](https://www.oreilly.com/library/view/monolith-to-microservices/9781492047834/) - [Fundamentals of Software Architecture - Book](https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/) - Δυστυχώς πολλές φορές η απάντηση σε ολα είναι - Kafka ... #not

fundamentals bad guys wordle kafka monolith chris richardson grokking

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Papers Read on AI

Play Episode Listen Later Jan 18, 2022 12:03

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. 2022: Alethea Power, Yuri Burda, Harrison Edwards, I. Babuschkin, Vedant Misra https://arxiv.org/pdf/2201.02177v1.pdf

algorithmic generalization datasets overfitting grokking

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Oct 11, 2021 29:46

#grokking #openai #deeplearning Grokking is a phenomenon when a neural network suddenly learns a pattern in the dataset and jumps from random chance generalization to perfect generalization very suddenly. This paper demonstrates grokking on small algorithmic datasets where a network has to fill in binary tables. Interestingly, the learned latent spaces show an emergence of the underlying binary operations that the data were created with. OUTLINE: 0:00 - Intro & Overview 1:40 - The Grokking Phenomenon 3:50 - Related: Double Descent 7:50 - Binary Operations Datasets 11:45 - What quantities influence grokking? 15:40 - Learned Emerging Structure 17:35 - The role of smoothness 21:30 - Simple explanations win 24:30 - Why does weight decay encourage simplicity? 26:40 - Appendix 28:55 - Conclusion & Comments Paper: https://mathai-iclr.github.io/papers/... Abstract: In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset. Authors: Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin & Vedant Misra Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

simple conclusion minds outline algorithmic generalization datasets overfitting grokking

Grokking Artificial Intelligence Algorithms: Understand and apply the core algorithms of deep learning and artificial intelligence in this friendly illustrated guide including exercises and examples

Knowledge = Power

Play Episode Listen Later Sep 6, 2021 445:43

”This book takes an impossibly broad area of computer science and communicates what working developers need to understand in a clear and thorough way.” - David Jacobs, Product Advance Local Key Features Master the core algorithms of deep learning and AI Build an intuitive understanding of AI problems and solutions Written in simple language, with lots of illustrations and hands-on examples Creative coding exercises, including building a maze puzzle game and exploring drone optimization About The Book “Artificial intelligence” requires teaching a computer how to approach different types of problems in a systematic way. The core of AI is the algorithms that the system uses to do things like identifying objects in an image, interpreting the meaning of text, or looking for patterns in data to spot fraud and other anomalies. Mastering the core algorithms for search, image recognition, and other common tasks is essential to building good AI applications Grokking Artificial Intelligence Algorithms uses illustrations, exercises, and jargon-free explanations to teach fundamental AI concepts.You'll explore coding challenges like detecting bank fraud, creating artistic masterpieces, and setting a self-driving car in motion. All you need is the algebra you remember from high school math class and beginning programming skills. What You Will Learn Use cases for different AI algorithms Intelligent search for decision making Biologically inspired algorithms Machine learning and neural networks Reinforcement learning to build a better robot This Book Is Written For For software developers with high school–level math skills. About the Author Rishal Hurbans is a technologist, startup and AI group founder, and international speaker. Table of Contents 1 Intuition of artificial intelligence 2 Search fundamentals 3 Intelligent search 4 Evolutionary algorithms 5 Advanced evolutionary approaches 6 Swarm intelligence: Ants 7 Swarm intelligence: Particles 8 Machine learning 9 Artificial neural networks 10 Reinforcement learning with Q-learning

Ep. 043: Grokking the top ten hard seltzers, last year vs this year

Harry's Pod.com

Play Episode Listen Later Sep 3, 2021 19:47

--- Panel ---Harry is joined by Jenn Litz-Kirk and Jordan Driggers to decipher what's going on with the top ten malt seltzers.---Time-Stamped Contents ---00:00 Intro - harry is dramatically in emo clothes and wearing blue blockers for no reason. 02:30 jenn recreates the plot of the Jerk. 02:50 harry sings song, nacho libre, etc. 04:20 Top ten seltzer pop quiz10:45 hot prices for labor day12:50 highlights from our fall distrib survey14:20 most salespeoples' time spent dealing with out-of-stocks15;30 jordan and rib going to Spokane. 16:45 jenn hanging at home, no more frisbee golf. makes herself known on the course.18:00 jenn thinks harry is jealous because he doesn't work out. 19:00 wrap-up.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests (almost) every day as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

top ten jerks spokane audio video seltzers grokking

Ep. 041 Grokking Wine & Spirits' wholesalers latest moves

Harry's Pod.com

Play Episode Listen Later Aug 27, 2021 23:54

--- Panel ---Harry is joined by Wine & Spirits Daily editors Sarah Barrett and Hana Kruger, along with beer editor Jordan Driggers, to discuss the latest in the W&S distribution world. ---Time-Stamped Contents ---00:00 needless banter about tech difficulties, Biscuit (the dog harry obsesses over) had cbd oil and is feeling the vibes. 04:45 industry trying to define these new bev-alc rtd's. 08:00 do rtd's play on-premise? harry says no, sarah says yes in short term.09:25 jordan says seltzers don't mix well with food. so true09:60 grokking seltzer's fall, but hana says still popular11:00 seltzers all about latest hot flavor or variety pack. 11:40 jordan doubles down on wc variety pack 312:40 sarah: rtd's expanding outlets13:30 harry goes on rant about trucks and delivery19:20 RNDC going into all control states direct threat to S-G.21:25 Montgomery county and boss hogg.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

wine spirits montgomery biscuit sg wholesalers audio video grokking sarah barrett

Ep. 038: Grokking Modelo Especial and Pacifico's growth, (and will it continue?)

Harry's Pod.com

Play Episode Listen Later Aug 17, 2021 16:38

--- Panel ---Harry is joined by bbd editors Jenn and Jordan.---Time-Stamped Contents ---00:00 Intro - bts talking canna-beverages coverage01:40 harry is dressed like a rodeo clown, and we don't know why.02:30 bts: jenn interviews bill newlands of constellation. 03:13 upside for modelo especial03:40 jenn describes modelo as top brand in many non-california markets.04:10 modelo is so big now, how much runway could it have? 05:30 modelo 50/50 hispanic vs general market, jim sabia predicted it.06:20 pacifico is starting to break out and shine.06:40 constellation is it's own biggest competitor in mex beer.08:00 corona is their go-to brand with line extensions/innovations.08:25 beer the big performer compared to wine/spirits at stz09:40 jordan simulates the modelo ad theme song, risking us being cited for copyright infringement. 10:15 talking about pod intro music.13:00 jenn gives us a sneak peak at rest of interview with stz's bill newlands.15:00 bts with bbd15:15 brewer distrib sales meetings going forward, some hybrid.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

growth pacifico audio video modelo especial grokking

Ep. 036: Grokking MTN Dew and more with chain expert Tom Fox

Harry's Pod.com

Play Episode Listen Later Aug 11, 2021 39:57

--- Panel ---Harry is joined by CM Profit Group's Tom Fox, along with Jenn and Jordan, to talk about all things chain sales.---Time-Stamped Contents ---00:00 intro - hard mtn dew not very similar to soft version.00:50 will it compete more with 4 loko or twisted tea?02:05 why low abv? likely due to issues of crossover from non-alc to alc.02:50 it's a big deal and small deal: small in that it's one brand, big in that it brings a new big distrib competitor to the cold box. 03:30 jordan: it's the opposite of what coke did with topo. 04:10 pepsi had choice to be producer or distributor, wisely chose to be distributor. 04:50 jenn doesn't think it'll blow the doors off. 05:35 harry thinks it's big deal structurally. 06:30 beer distribs who are also pep/coke bottlers have edge.07:05 TOM FOX joins to talk chains08:45 boston taking marketing, pepsi taking sales. why???09:40 why pep chose boston beer? cheaper options. burwick connexion. 10:40 how will pepsi get mouse access in bev-alc cold box?11:00 tom: likely have lotsa communication between two12:05 pepsi serious about learning about bev-alc, bigger than mtn dew. pepsi new bev-alc distrb12:40 what about retail pricing? mainstream, so vol accretive to boston (but not margins)14:00 offering distribs on-premise ain't much of an offer.14:30 harry: is this like inbev embedding themselves with a-b to learn the business in early oughts?15:00 boston ranks tops in tom's on and off-premise chain surveys.15:40 new belgium on the map now with chains16:00 constellation hitting hard too.16:30 a-b still gets top marks, most mentions. molson coors right after.17:40 chains loving virtual vs in-person19:30 distribs loving it too21:30 suppliers' performance on vs off-premise chain differences22:00 scores haven't improved even though supplier performance has, because retailer expectations also grow.23:30 buyers are mba's except for a few OG goats. suppliers playing ketchup24:10 suppliers and retailers more aligned today.25:05 on the other hand, retailers want more tailored solutions, to "be understood".26:05 craft brewers stepping game: nbb, others28:05 smarter to start with regional chains or go straight to bentonville?29:20 constellation alternative to a-b and molson coors for cat captaincy.31:35 tom: seltzers are overspaced32:50 assortment has returned as a big priority after pan year.34:40 jordan asks about mc sku reduction and retailers' reaction37:55 tom describes their digital e-sales training platform at fox sales coaching---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

chain mountain dew audio video tom fox grokking

Ep. 35: Grokking gopuff's business model, and High Noon's success

Harry's Pod.com

Play Episode Listen Later Aug 10, 2021 21:33

--- Panel ---Harry is joined by Jenn Litz-Kirk, Sarah Barrett, and Jordan Driggers---Time-Stamped Contents ---00:00 Intro - Gopuff model02:00 They have the capital to win the prize.02:15 Harry takes us down milkman and diaper delivery lane03:35 So many models: tip of the iceberg04:45 It will all end up being Amazon in the end05:45 Favor was Harry's first delivery app experience, at Jenn's wedding in 201407:00 Jordan doesn't think tobacco is healthy. 07:40 Harry disagrees. 08:20 Sarah groks Gallo's High Noon.12:15 Jordan wonders what the demo is for HN at that big price point.13:10 Making flavors unique is the game. 14:00 A-B's Neutral is their answer to High Noon. 16:10 Shortage of sixtel kegs.18:20 Hot girl summer has breached America's airports.19:40 For the rest of the pod Jenn speaks into a megaphone.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

america amazon success favor shortage business models neutral gallo high noon hn audio video gopuff grokking sarah barrett

Ep. 025 Grokking A-B's purchase of Hawaii Branch and Andy England's latest trick

Harry's Pod.com

Play Episode Listen Later Jul 24, 2021 16:33

Harry is joined by Jenn and Jordan.---Time-Stamped Contents ---00:00 Intro - Biscuit be patrollin', they be hatin'.01:00 Grokking A-B's plans to sell its Hawaii distribution branch to non-AB house Odom.11:45 Former MolsonCoors CMO Andy England showing his British moxy using same slogan for vodka brand.

england british hawaii ab trick branch odom grokking

Ep. 024: Grokking cyber-security, weed, and Mars

Harry's Pod.com

Play Episode Listen Later Jul 21, 2021 23:57

--- Panel ---Harry is joined by Jenn Litz-Kirk, Jordan Driggers, and Chanel Gulf Coast---Time-Stamped Contents ---00:00 Intro - Jessica won our company seltzer cannonball contest (yes we are all over 21). 02:20 Grokking cyber-security and ransomware issues in the beer business. 06:00 Grokking the future of cannabis in the bev-alc space. 07:50 How California screwed the pooch on cannabis taxation and regulation.10:10 Conquering the last mile is crucial, 50% of canna trade. 12:40 Does FL have a commerce clause issue with their cannabis structure? 13:20 Harry tries to convince everybody he's a legal expert. 14:45 Bezos, Mars, and earth.18:00 Athletic building another brewery on the East Coast. 20:10 Hot Girl Summer update.21:30 Introducing Uncle Paul Fall22:50 Harry reveals he likes "Call Her Daddy" podcast. 23:00 Save the date: Beer Summit at the Hotel Del in San Diego January 30, 2022. Sponsorships available, contact millicent@beernet.com (Chanel Gulf Coast).---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

california mars jeff bezos east coast cybersecurity weed athletic conquering hot girl summer call her daddy audio video hotel del grokking beer summit

Ep. 023: Grokking Bang Energy seltzer news and Boston / Beam deal

Harry's Pod.com

Play Episode Listen Later Jul 15, 2021 24:18

--- Panel ---Harry is joined poolside by editors Jenn Litz-Kirk and Jordan Driggers.---Time-Stamped Contents ---00:00 Intro - Pod'ing by the pool.01:00 Introducing the SchuhPub Crime Wagon (tm)02:10 Dog initially ejected, but now has hang of it.02:25 It is a "Crime Wagon", to prevent crimes.02:50 Black Rifle Coffee gets fair write-up in New York Times Magazine. Harry's family ranch Guajolote gets mentioned.03:20 Jenn talks about Bang Energy's plans for bev-alc selter rollout.04:05 Rolling out new markets, focus on military. 05:50 Harry comments on Biden Administration's targeting of bev-alc for trade practices and antitrust.08:00 Grokking the Boston Beer / Beam Suntory deal.09:05 Harry has ideas for Bud Ice.09:35 JD thinks this deal muddies waters.11:30 Does consumer care about the base?11:45 Jenn says this deal litmus test to see. 12:00 Are distribs psyched? 12:40 Jenn points out Cayman Jack is already huge.14:20 BBC beer distribs sitting prettier than Beam/W&S distribs on this deal. 15:20 Hard to make good whiskey RTD.16:00 This deal feels like a big test.17:20 Starting to plan for Beer Summit 2021 at the Hotel Del Coronado. 18:00 Jenn falls for Harry's Millennial bait.20:00 Distributor Productivity Summit on Monday, last chance. 20:30 Harry wants to start making money on the pod by selling t-shirts to recoup losses from cargo bike. 21:00 Pods by the pool at Hotel Emma next week.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text or call Harry in confidence at:‪ (262) 345-2501‬-Audio/Video feeds at HarrysPod.com-Articles referenced at beernet.com-Twitter / IG / TikTok: @beerbizdaily

starting dogs new york times bbc millennials rolling jd biden administration beam pods seltzer rtd audio video black rifle coffee bang energy hotel del coronado bud ice grokking beer summit guajolote

Ep: 016 Grokking A-B's choice of a new U.S. CEO, Brendan Whitworth

Harry's Pod.com

Play Episode Listen Later Jun 29, 2021 11:07

Our first remote/mobile pod from bustling Vegas. It's a short 11 minutes. ---Panel---Harry is joined by editors Jenn Litz-Kirk and Jordan Driggers ---Contents---00:00 Intro03:13 Grokking Brendan---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text comments and topic ideas to Harry at:‪ (262) 345-2501‬-Audio feed at HarrysPod.com

las vegas new u whitworth grokking

Ep. 015: Grokking retailers' bid for Supreme Court hearing

Harry's Pod.com

Play Episode Listen Later Jun 25, 2021 17:24

--Panel---Harry is joined by Wine & Spirits Executive Editor Sarah Barrett for a special edition pod.---Contents---00:00 Intro01:00 Delicato acquires Coppola, what do it mean?03:23 Harry tries to flex and compares himself to Bill Murray in Lost in Translation. Sarah checks him. 04:45 Is it believable that Daniel Craig can date 20 year-olds in Bond movies?05:36 Grokking the challenge to Missouri's in-state vs out-of-state laws. 06:40 Same issue, same players, different decade: Tom Wark and James Tanner still hammering away at three-tier.08:17 State residency requirements and physical location mandates under attack.14:46 Will the Supreme Court hear the case? Probably not.---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text comments and topic ideas to Harry at:‪ (262) 345-2501‬-Audio feed at HarrysPod.com-Video feed at youtube.com/c/beerwriterTwitter / IG: @beerbizdaily

state video wine supreme court missouri bond bill murray daniel craig lost in translation retailers contents coppola court hearing delicato grokking james tanner tom wark

Ep. 013: Grokking the top FMB/seltzers

Harry's Pod.com

Play Episode Listen Later Jun 23, 2021 33:00

----PANEL----Jenn-Litz, Sarah Barrett, Jordan Driggers00:00 Intro - Jordan attends Fiesta NIOSA block party.01:56 Jenn finally got her mic figured out.03:44 Harry administers an FMB pop quiz to Jordan. 04:40 Ranking FMBs.09:35 Wine and spirits also having supply problems. 12:37 Jenn groks celebrity-backed FMBs.16:00 Harry hosting a Molson Coors exec tonight to discuss the charms of San Antonio as a host city for next year's distrib meeting. 18:07 Jenn's ideas for podcast recurring segments. 20:55 More texts from Harry's son.22:58 More discussion on pod segments. 24:25 Jordan talks about Great Nation Brewing's expansion plans.26:06 Jordan discusses Reyes' on-premise account issues in SF.27:35 truly trends and on-premise seltzer sales. 29:33 Harry picks on Jordan for being a picky seltzer drinker.30:26 Hot girl summer flavor: watermelon. ---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text comments and topic ideas to Harry at:‪ (262) 345-2501‬-Audio feed at HarrysPod.comTwitter / IG: @beerbizdaily

wine san antonio reyes sf molson coors seltzers fmb grokking sarah barrett

Ep. 012: Grokking three-tier and GoPuff

Harry's Pod.com

Play Episode Listen Later Jun 22, 2021 31:39

Shipping bev-alc to consumers' doorsteps is the last frontier. We grok the GoPuff model of buying brick-and-mortar with technology to bridge the last mile.----PANEL----Jenn-Litz, Sarah Barrett, Jordan Driggers, Gulf Coast----Contents-----00:00 Intro - After Harry makes it needlessly complicated, Sarah offers a simple yet elegant solution for fixing Jenn's mic. 02:14 Victory! Jenn's mic finally works.04:23 Jenn and Harry bicker about the direction of the podcast at a dinner with a top industry exec.05:43 Banter vs content: Why can't we have it all?06:45 A rare apology from Harry 07:55 GoPuff buys Kentucky's Liquor Barn09:13 GoPuff's interesting model of buying liquor retailers instead of partnering with them (like Drizly).10:46 Jordan brings up the Kentucky Bourbon angle.12:00 Harry discovers he doesn't fully understand GoPuff's model as clearly as he thought. Sarah sets him straight.13:18 First purchase on Favor: Camel Snus14:02 Biscuit's eating and sleeping habits15:37 Jenn describes a three-tier panel she attended last week, and how w&s suppliers and distribs are in lock-step in supporting tech to solve the DTC issue rather than through the courts. 18:14 Technology solving the decades-long DTC / three-tier fight that the courts couldn't solve.18:46 The amazon wildcard23:38 The perils of DTC, beer vs wine/spirits: freshness vs progeny.28:13 The shortages of hot girl summer, and why you don't order raw seafood on Mondays. ---About---Beer Business Daily publisher Harry Schuhmacher joins his editors and other guests every workday as they grok the industry issues of the day. -Text comments and topic ideas to Harry at:‪ (262) 345-2501‬-Audio feed at HarrysPod.comTwitter / IG: @beerbizdaily

technology victory kentucky tier shipping banter biscuit dtc drizly kentucky bourbon gopuff grokking sarah barrett

Ep. 010: Grokking AB's bottle bottleneck, capacity matters, and on-premise killing it

Harry's Pod.com

Play Episode Listen Later Jun 18, 2021 26:54

It's the explanation of capacity utilization that does it for me.

covid-19 restaurants killing recording ab unicorns capacity btw night shift premise checkers bottleneck grokking

Ep: 009: We grok the latest report from IWSR's Brandy Rand on bev-alc's killer 2020.

Harry's Pod.com

Play Episode Listen Later Jun 17, 2021 35:12

Plus, still working out the logistics of the podcast ..... on the podcast. It's called being efficient. By the way, if you have ideas for the pod, text Harry at ‪(262) 345-2501. (

killers cbd palace caesar yt rand gulf coast grok jenny mccarthy liberty mutual happy dad grokking iwsr sarah barrett

Ep. 06: Grokking the recent crazy IRI numbers with Jordan (poor audio)

Harry's Pod.com

Play Episode Listen Later Jun 11, 2021 8:11

We only had about 15 minutes in a crowded hotel lobby before we were expected at a supplier reception, but Jordan and Harry were able to knock out a quick-and-dirty pod before the weekend regarding the craziness doing on in off-premise numbers. We apologize for the poor audio quality (we were in the lobby bar of the Hotel Emma). (

numbers poor grokking

Ep. 001: Grokking the outsized effect 2005 had on our industry.

Harry's Pod.com

Play Episode Listen Later Jun 6, 2021 19:13

2005 was a seminal year. Was 2020 another one? (

grokking

SBM 049 | Grokking Machine Learning - Luis Serrano | Luis Serrano

Book Movement

Play Episode Listen Later Apr 27, 2021 65:56

Science Book Movement. Revisión Online del Libro: Grokking Machine Learning. Autor: Luis Serrano. Invitado: Luis Serrano. Ve la grabación en video por Youtube: https://youtu.be/LN7-zZGON2o. Únete a nuestra comunidad en Discord a través del siguiente enlace: https://bookmovement.co/discord. See acast.com/privacy for privacy and opt-out information.

online discord machine learning ve revisi luis serrano grokking

Episode 25: ML 021: Grokking Deep Reinforcement Learning with Miguel Morales

Adventures in Machine Learning

Play Episode Listen Later Feb 16, 2021 65:21

Miguel Morales is a Machine Learning engineer at Lockheed Martin and teaches at Georgia Institute of Technology. This episode starts with a basic explanation of Reinforcement Learning. Miguel then talks through the various methods of implementing and training systems through Reinforcement Learning. We talk algorithms and models and much more… Panel Charles Max Wood Guest Miguel Morales Sponsors Dev Heroes Accelerator Links GitHub | mimoralea/gdrl Manning | Grokking Deep Reinforcement Learning Twitter: mimoralea Email: mimoralea@gmail.com Picks Charles- The Expanse by James S. A. Corey Charles- Dev Heroes Accelerator Miguel- Deep Reinforcement Learning Miguel- Youtube: Dimitri Bertsekas Miguel- RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning Miguel- Thor ( 2011 )

technology panel thor machine learning github expanse lockheed martin georgia institute reinforcement learning miguel morales james s a deep reinforcement learning charles max wood grokking

Kalle Rosenbaum: Grokking Bitcoin, Cryptography & Encryption

The Bitcoin Matrix

Play Episode Listen Later Feb 9, 2021 81:02

Author of Grokking Bitcoin, Kalle Rosenbaum joins Ced to discuss: - How does the Bitcoin network prevent cheating - What is a cryptographic function & is it a defensive weapon of war - Will government punish their people harshly for using Bitcoin & how bloody and violent will it get - How cyptographic tools & encryption are the only ways to maintain privacy and security during the age of big brother -- Hard forks vs soft forks - Backwards compatability vs forward compatitability - & so much more

bitcoin backwards encryption kalle cryptography rosenbaum ced grokking

(S1,Ep15) Self-awareness, Perspective, and Imagination -- with Huw Lloyd

Vygotsky Podcast

Play Episode Listen Later Aug 27, 2020 133:53

Researcher Huw Lloyd, fluent in numerous mental models, is a good explainer of concepts -- including many I was completely unversed in. A few threads run through the entirety of this chat: development, self-awareness, and construing an active orientation to any given situation. SECTION 1: Our pathways to Vygotsky 0:36 - Reflections on Huw's recent "Vygotsky and Parenting" (http://tiny.cc/ymqpsz) 1:45 - Pros and cons of taking scholarly shortcuts 6:07 - Huw's arrival to Vygotsky, in part through dissatisfaction elsewhere SECTION 2: Huw's ideas about Active Orientation 16:38 - What is Active Orientation? 23:30 - Is Active Orientation a practice? (An exercise in self awareness) 31:58 - Active Orientation can be documented (microgenesis research of Huw's) SECTION 3: What is Developmental Education? 40:37 - A primer on Davydov and Developmental Education 46:07 - Empirical thinking vs. Theoretical thinking 50:23 - Grokking the material and the History of ideas 52:59 - Problems are Good 55:40 - An illustrative lesson of Davydov's 1:03:43 - Some key characteristics of developmental education 1:07:41 - Crises, construals, and neoformations 1:10:20 - The Desert Oak: a developmental TRIZ problem SECTION 4: Imagination and Confidence-building 1:16:07 - Imagination, flow, and problem-solving 1:24:10 - Systems and Design Ideas (TRIZ approach) 1:30:22 - Earned, authoritative confidence: Your tempered ideas are become Real 1:35:20 - The importance of problem-construal or framing 1:48:04 - Problem-creating, -solving, and -construing 1:53:51 - This is rich, highly concentrated material (Foundational, generative, "unfoldable" concepts) 1:55:27 - Notational vs. developmental education (and epistemology) 1:57:10 - Final two questions (adult-development & advice for problem-designers) 2:06:20 - Complex vs. complicated (and self-regulation and distance learning) 2:09:46 - An idea for lunch (as promised: http://tiny.cc/bvqpsz) References: http://tiny.cc/5vqpsz - "A Study of Active Orientation" (brief introduction) http://tiny.cc/2xqpsz - "TRIZ: a Powerful Methodology for Creative Problem Solving" http://tiny.cc/dxqpsz - "Going with the Flow: How to Engage Boys (and Girls) in Their Literacy Learning"

history real confidence parenting girls study reflections pros complex imagination self awareness references crises earned foundational theoretical empirical huw creative problem solving vygotsky triz grokking developmental education flow how

33: Grokking Bitcoin med Kalle Rosenbaum

Konsensus

Play Episode Listen Later Jul 9, 2020 48:22

Tredje avsnittet av Konsensuspodden i nytt format, som livepodd i samarbete med Safello. Varannan onsdag klockan 20:00. I dagens avsnitt av Konsensuspodden pratar vi med Kalle Rosenbaum som är författare till boken "Grokking Bitcoin". Vi pratar om hans bok, hur han hittade till Bitcoin och hur det alltid finns något nytt att lära sig om Bitcoin. Länkar nämnda i avsnittet: https://www.manning.com/books/grokking-bitcoin https://github.com/kallerosenbaum/grokkingbitcoin http://rosenbaum.se/book/ https://www.adlibris.com/se/bok/grokking-bitcoin-9781617294648 Vill du kolla på avsnittet kan du göra det på youtube: https://www.youtube.com/watch?v=6WPsUT3x2_w Hårdvaruplånböcker: Trezor: https://shop.trezor.io/product/trezor-model-t?offer_id=15&aff_id=1187&tiny_url=1 Ledger: https://shop.ledger.com/pages/ledger-nano-x?r=aa88 Använd gärna våra länkar så stödjer du podden! Kontaktinfo: www.konsensuspodden.se Ludvig: twitter.com/ludvigoeberg Tristan: twitter.com/t4t5 Dea: twitter.com/deabrunner Kalle: https://twitter.com/kallerosenbaum Keywords: Konsensus Konsensuspodden Grokking Bitcoin Kalle Rosenbaum

bitcoin kalle tredje rosenbaum ludvig trezor varannan grokking

HJRP2103 Grokking PCs and NPCs, Skype Games, Love, Confession, Horror

Happy Jacks RPG Podcast

Play Episode Listen Later Jan 20, 2018 109:02

hjrp2103 Your Hosts: Stu, Kimi, Davy, Stork In this Episode: We discuss grokking a PC or NPC. Bobby from Belgium asks for some tips running a game over Skype. The Reverend Doctor Johnson sends a brief love note. Minnesota sends us a confession and horror story.

games minnesota horror confessions pc skype belgium npc npcs kimi davy grokking

EP.22 Bryan Eisenberg (Best-Selling Author) on Grokking Baseball One Resource at a Time

The Baseball Drive

Play Episode Listen Later Jan 12, 2017 59:18

EP.22 Bryan Eisenberg (Best-selling Author) on Grokking Baseball One Resource at a Time On this episode, Bryan Eisenberg shares the strategies and resources that he has found to help him support his son, and help him grow each day. Bryan discusses the importance of communication and goal setting, and also how he has used to […]

baseball bestselling best selling authors bryan eisenberg grokking time on

#82: Grokking Algorithms in Python

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Oct 27, 2016 58:45

See the full show notes for this episode on the website at talkpython.fm/82.

online training web software developers programming algorithms python data science online courses cloud computing ide software developers web development mongodb nosql pycharm grokking python3 python2

Podcasts about grokking

Best podcasts about grokking

Harry's Pod.com

The Nonlinear Library

Book Overflow

Machine Learning Street Talk

BeerNet Radio

Tell Me Your Story

Kodsnack

The Inside View

Latest news about grokking

Latest podcast episodes about grokking

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

"Deep learning as program synthesis" by Zach Furman

Kodsnack 673 - Ett mönster för allt

Understanding Grokking

“Grokking,” Sitecore RCE, & Agentic AI Risk

Kirill Bobrov Reflects on Grokking Concurrency

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Coordination in Distributed Systems - Grokking Concurrency by Kirill Bobrov

Basics of Concurrency - Grokking Concurrency by Kirill Bobrov

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Grokking Application Security: Malcolm McDonald on Securing Your Apps

Automating Scientific Discovery, with Andrew White, Head of Science at Future House

Training Zamba: A Hybrid Model Master Class with Zyphra's Quentin Anthony

660: The Bendy Piece

Grokking Synthetic Biology | Dmitriy Ryaboy (Twitter, Ginkgo Bioworks)

The ARC Prize: Efficiency, Intuition, and AGI, with Mike Knoop, co-founder of Zapier

AI Scouting Report: NOT Investment Advice Edition

Kodsnack 570 - Debug your ideas, with Eric Normand

Kodsnack 570 - Debug your ideas, with Eric Normand

ART GISER - GROKKING - Energetic NLP

EP09: Grokking Streams and Consumers

Explaining Grokking Through Circuit Efficiency

Neel Nanda on mechanistic interpretability, superposition and grokking

ART GISER - GROKKING - Energetic NLP

EP 87: AI Safety Landscape

AF - Explaining grokking through circuit efficiency by Vikrant Varma

Grokking effective population size, with Robin Waples

E57: Google's PaLM 2 with Paige Bailey

AF - An interactive introduction to grokking and mechanistic interpretability by Adam Pearce

LW - Visible loss landscape basins don't correspond to distinct algorithms by Mikhail Samin

LW - Visible loss landscape basins don't correspond to distinct algorithms by Mikhail Samin

AF - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

LW - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

LW - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope

AF - Eric Michaud on the Quantization Model of Neural Scaling, Interpretability and Grokking by Michaël Trazzi

Eric Michaud on scaling, grokking and quantum interpretability

CRC Synod 2023 Aftermath: Classis GRE Begins the Process of Grokking Her New Reality

Neel Nanda - Mechanistic Interpretability

Eric Normand - Grokking Simplicity

AF - Paper Replication Walkthrough: Reverse-Engineering Modular Addition by Neel Nanda

AF - 200 COP in MI: Interpreting Algorithmic Problems by Neel Nanda

Nodesignal-Buchclub - E12 - Grokking Kalle

AF - Mesa-Optimizers via Grokking by orthonormal

Ep. 123 Grokking KDP's Investment in Athletic Brewing

What is Grokking?

Ep. 121 Grokking Boston, Seltzers, RTDs, and Southern

Ep. 117 Grokking Mark Anthony Brands' New Innovations

LW - Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA by Marius Hobbhahn

LW - Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA by Marius Hobbhahn

20220909 Nicole Sandler Show - Grokking TrumpDoc Gate with Teri Kanefield

EA - Grokking “Semi-informative priors over AI timelines” by anson.ho

AF - Grokking “Forecasting TAI with biological anchors” by Anson Ho

Grokking gRPC - .NET 108

Grokking Continuous Delivery Book Review with Christie Wilson

Ep. 25 – Churning the Ocean of Milk

Badguys - 40

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Grokking Artificial Intelligence Algorithms: Understand and apply the core algorithms of deep learning and artificial intelligence in this friendly illustrated guide including exercises and examples

Ep. 043: Grokking the top ten hard seltzers, last year vs this year

Ep. 041 Grokking Wine & Spirits' wholesalers latest moves

Ep. 038: Grokking Modelo Especial and Pacifico's growth, (and will it continue?)

Ep. 036: Grokking MTN Dew and more with chain expert Tom Fox

Ep. 35: Grokking gopuff's business model, and High Noon's success

Ep. 025 Grokking A-B's purchase of Hawaii Branch and Andy England's latest trick

Ep. 024: Grokking cyber-security, weed, and Mars

Ep. 023: Grokking Bang Energy seltzer news and Boston / Beam deal

Ep: 016 Grokking A-B's choice of a new U.S. CEO, Brendan Whitworth