POPULARITY
Categories
Noch eine Woche bis zur WM und wir tasten uns mal langsam in die Stimmungslage: Ist schon WM-Fieber? Hängen die Fahnen? Haben wir schon Bock? Dafür haben wir uns Moderator und DFB-Vorsänger Bengt Kunkel eingeladen, der uns unter anderem Einblicke in die organisierte Fanszene der Nationalelf gibt.Bengt findet ihr auf Instagram genau hier: https://www.instagram.com/bengt.kunkel/Ihr habt Bock bekommen auf HOLY? Eure Alternative zu ungesunden Soft- und Energydrinks! Dann nutzt doch unsere Rabattcodes und spart bei eurer nächsten Bestellung:FRITZSTROH5 (5€ Rabatt auf die erste Bestellung)FRITZSTROH (10% Rabatt auf alles, auch für Bestandskunden)Nutzt unseren Link und ab dafür in den Warenkorb: https://weareholy.com/fritzstroh/tryMit dem Code „FRITZUNDSTROH“ bekommt ihr bei unserem Partner Matchday Nutrition - Sportnahrung extra für Fussballer - maximalen Rabatt im Shop: http://bit.ly/fritzundstrohpodcast---------------Wöchentlicher Fussball-Podcast mit Max Fritzsching & Michael Strohmaier! Rückblick & Highlights vom Bundesliga-Spieltag und auch ein bisschen 2. Bundesliga - jeden Sonntag neu!Auch als YouTube-Show verfügbar: www.youtube.com/@fritzundstroh_fussballshowAußerdem sind wir zurück auf Twitch: https://www.twitch.tv/fritzundstroh Clips, Memes und vieles mehr auf Social Media!Instagram: www.instagram.com/fritzundstroh_fussballshow/TikTok: www.tiktok.com/@fritzundstrohX (Twitter): www.x.com/FRITZUNDSTROH---------------Managed by Scaling GmbHBusiness-Anfragen an: info@scaling-agentur.de Hosted on Acast. See acast.com/privacy for more information.
The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity's Last Exam, etc. These metrics are useful, but don't always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesses in the real world. One of which is Vending Bench.In Anthropic's Mythos Preview System Card, Andon was the only third party eval to get their own section, observing increasingly concerning aggressive behavior:You don't know what a model is capable of doing in the real world unless you actually give it inventory, a wallet, tools, customers, competitors, humans, & some time. More often than not, it'll surprise you how much a model is capable of and in doing so, also reveal unexpected behavior: deception, context collapse, emergent coordination, & bizarre negotiation behavior.While an inflection point in personal agents came post-OpenClaw after full file access with bypass permissions became the norm, it is yet to come for agents in the real-world. However Andon Market, an actual in person store fully run and managed by AI, is paving the way for what is possible.Full Video PodFrom Claude trying to call the FBI over a $2/day vending machine charge to AI agents forming price cartels, hiring human employees, running physical stores, and writing existential robot musicals, Andon Labs is stress-testing what happens when frontier models stop being chatbots and start acting in the real world. In this episode, Andon Labs cofounders Lukas Petersson and Axel Backlund join swyx and Vibhu to unpack the strange, funny, and genuinely concerning edge cases that emerge when agents run businesses over long horizons.We go deep on Vending-Bench, Project Vend, Vending-Bench Arena, Bengt, Butter-Bench, Luna, and Andon's broader mission of building realistic real-world evals for autonomous AI systems. Lukas and Axel explain why dollar-denominated evals reveal things traditional benchmarks miss, how Claude ended up reporting its vending machine fees as cybercrime, why long context windows can drive agents into meltdown loops, what happens when agents compete with each other, and why the future of AI safety may depend on testing models in messy physical environments instead of clean benchmark sandboxes.We discuss:* Why Andon Labs started with dangerous capability evals and long-running agents* Vending-Bench and why running a vending machine is a deceptively hard AI benchmark* Why money-based evals avoid the saturation problem of traditional benchmarks* How Claude tried to call the FBI over a $2/day fee* Why long-horizon agents can spiral into existential and legalistic breakdowns* Project Vend: putting an AI-run vending machine inside Anthropic* Why real humans are “out of distribution” for simulated agents* Claudius, Seymour Cash, and the chaos of AI CEOs* How a human briefly became CEO of Claudius through a manipulated election* Why multi-agent systems can converge back into “helpful assistant” behavior* Bengt, Andon's internal office agent with email, spending, terminal, phone, camera, and internet access* How Bengt traded Amazon purchases for face-recognition training data* Claude's aggressive behavior, lies, refund avoidance, and price-cartel behavior in Arena* Why eval awareness may become the AI version of “are we living in a simulation?”* Blueprint Bench, spatial intelligence, and why models still misunderstand physical rooms* Butter-Bench and testing LLMs as robot orchestrators* Luna, the AI-run physical store with a three-year lease and human employees* The new Andon cafe in Sweden and why real-world geography matters for agent evals* Rotten tomatoes, perishable goods, and the hidden difficulty of running a physical businessLukas Petersson* LinkedIn: https://www.linkedin.com/in/lukas-petersson-181a83172/* X: https://x.com/lukaspetAxel Backlund* LinkedIn: https://www.linkedin.com/in/axelbacklund* X: https://x.com/axelbacklundAndon Labs* Website: https://andonlabs.com* Vending-Bench: https://andonlabs.com/evals/vending-bench* Andon Vending: https://andonlabs.com/vendingTimestamps00:00:00 Introduction00:01:00 Andon Labs and the Origins of Vending-Bench00:05:21 Why Money-Based Evals Matter00:09:51 Agent Harnesses and Self-Modifying Systems00:13:36 Claude Calls the FBI00:16:33 Project Vend: Claude Runs a Real Vending Machine00:21:44 Seymour Cash, AI CEOs, and Election Chaos00:27:16 Multi-Agent Coordination and Slack Observability00:30:18 When Will Agents Run Real Businesses?00:34:56 Bengt: Andon's Internal Office Agent00:40:06 Real-World AI Safety and Long-Horizon Traces00:44:28 Lying, Refunds, and Price Cartels in Arena00:52:42 Eval Awareness and Simulation Behavior00:56:06 Blueprint Bench, Butter-Bench, and Robotics01:04:37 Luna: The AI-Run Physical Store01:09:29 The Sweden Cafe and Real-World Expansion01:13:16 What Comes Next for Andon LabsTranscriptIntroduction: Andon Labs, Long-Running Agents, and Real-World EvalsSwyx [00:00:00]: Welcome to Lukas and Axel from Andon Labs, and I'm joined by my, favorite guest host. Anything security, safety, alignments, Vibhu., welcome.Lukas [00:00:15]: Thank you for having us.Axel [00:00:16]: Thank you.Swyx [00:00:17]: Let's match names to voices., maybe you wanna take turns introducing yourselves.Lukas [00:00:21]: I'm Lukas.Axel [00:00:22]: And I'm Axel.Swyx [00:00:24]: Let's introduce Andon Labs a bit. How did you guys come together?, you have different backgrounds, but you're both Swedish., was that, a big part of it?Lukas [00:00:33]: So when I went to high school, there was this really cool guy who had a superpower. He could code. So he made like the or like the app for the, for the school and stuff, and he was super cool, and I wanted to be like him, and that was that guy.Axel [00:00:47]: I don't know about this.Swyx [00:00:49]: But you went to different universities, right?Lukas [00:00:51]: But same high school.Swyx [00:00:52]: I see.Lukas [00:00:52]: So we always said, “Oh, once we graduate university, then we should start a company,” and that's what we did.Swyx [00:00:58]: Wow, there you go. And about a year ago, you kinda burst onto the scene with Vending Bench, but, was there a thing before that was, kind of like the inception?From Dangerous Capability Evals to Vending BenchAxel [00:01:07]: So we did work, yeah, with, Anthropic was one of our, early customers in doing, evals. So we did, dangerous capability evals., nothing we published openly. But then we started thinking about doing some kind of, public benchmark, and one thing that we really started thinking about, was like running agents and specifically agents managing businesses., ‘cause-- and this was, early 2025., and I think the first, mentions of people will be running, person unicorns or even autonomous companies. So we thought, “Let's make a benchmark of how well can an agent run the probably simplest business, possible,” and, that's probably, running a vending machine. So that's the first public one we did. And it was very, like-- there was almost no one that noticed it in the first couple of months, I think., so we released it in February last year, and then I think around Easter last year, we got, the first viral tweet about it, that someone else did.Lukas [00:02:11]: We tweeted a bunch, uh When it came out and, tried our best.Axel [00:02:15]: We tried.Vibhu [00:02:16]: It's the one at Anthropic, right?Lukas [00:02:18]: So thisSwyx [00:02:19]: This is a classic thing we should get out of the way.Lukas [00:02:20]: Exactly. There's two versions.Swyx [00:02:22]: Everyone does this. Yes.Lukas [00:02:23]: There's Vending Bench, which is the simulated one, which we did, completely independently in February., and then, like Axel said, that was like-- That was the thing that didn't get any traction in the beginning, but then some random person made a tweet about it, and thatAxel [00:02:38]: You have the paperLukas [00:02:38]: That is the paper. Correct, yeah., and then since we thought this was very fun, we thought, oh, I think this is also, one thing with Andon Labs, the way we kind of like decide what to do next and what projects to do, it's what is like the heuristic we use is what is fun? Is What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So, then we basically had this idea, and then we, like-- But then we needed a place for it and, putting it out in the public would probably not really work., would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic, and they were “Yeah, you can have space. This sounds fun.” UmSwyx [00:03:21]: It's like a small fridge, right? It's like a mini fridge.Axel [00:03:23]: Absolutely.Swyx [00:03:24]: People-- There's like a stripe thing or like anVibhu [00:03:27]: Oh, okay. So it was very OG, the early daysLukas [00:03:28]: That's the OG one. YeahVibhu [00:03:29]: IPad on this. We saw it in June, like two months after After it had been there. They upgraded a little bit. There's a security camera for making sure you actually Venmo the thing.Swyx [00:03:40]: So, my impression, okay, we're, we're going straight into project Ven because it's such a iconic thing. I do want to cover a little bit of that, the origin story even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in future of AI, interested in developing evals. But how the hell do you just, walk into Anthropic's doors and, work with them, right? What is What are they looking for? What works? And then maybe, when you launch, I always think, obviously it would be better to launch with a lab, but, sometimesVibhu [00:04:12]: It's harder to do than it seems.Swyx [00:04:13]: Exactly. So either of those, which are more sort of newbie beginner questions, but, I think it's meaningful advice to others.Lukas [00:04:21]: We get this question a lot, and I don't think our experience is maybe the best., but, the way we did it was that we just built a bunch of things that we had conviction would be useful, and then we just, set up a server and sent it to them for free to use. And then after a while they were “Oh, yeah, this is actually kind of useful. We should probably pay for this.”, but that took a while. I don't know if this is, the best path to doing it, but that's how it went for us.Axel [00:04:47]: I think maybe generally, building-- everyone is interested in good evals, and especially evals that, don't saturate that easily. So, if you can build an eval that, tests something novel, something useful, and you have, good separation of models, like your, the more advanced models rank higher than the worst models, and then you can, yeah, you can, publish it and, try to get some traction, sort of how Vending Bench got attention., and then probably some lab will be interested or you can at least have something to reach out with, when you're doing that.Why Dollar-Based Evals MatterSwyx [00:05:21]: I think you are in, you're in one of the few categories of, evals that correlate to real money. Like Suelancer was also last year, right? Where, people solve actual Upwork. Was it Upwork or other tasks?, something. Where's the, where's, like It's like a dollar value, right? Forget your ELO scores. Forget yourAxel [00:05:37]: PercentilesSwyx [00:05:38]: Zero to one hundred percents. Just go straight for dollars and, that's AGI.Lukas [00:05:43]: And there's like-- I think the nice thing is that there's no ceiling. You can just-- It never saturates because it could just make more and more money. Like If there's oh, Percentage-wise, then, you can't go above, a hundred. And I think like Even when you're not at the hundred, I think a lot of these, evals have a lot of problems in them. So, actually it's like if you getAxel [00:06:05]: To like 92 or something like that, many of them. It's like then there's like there's no really no difference between 92 and 93 because the eval itself is problematic and has noise in it. And I think a lot of evals are saturated like that, but people like pretend that there ‘s still signal in them, but there really isn't.Vending Bench 1, Harness Design, and SaturationSwyx [00:06:24]: Like Super bench verified., even Vending Bench 1 saturated, right? Maybe we can talk about that., may- and maybe set up Vending Bench for a lot of folks who don't know. Actually, things that were very basic like there's limited slots, like you have to pay rent., these are elements where like it doesn't come across in the, in the narrative, but even being adversarial towards the agent, I think these are all like very interesting dimensions.Axel [00:06:47]: I don't really think it's saturated, right? Like it It was more like it was not designed in a way that was really, like true to how AI developed. Like we had an agent harness in it that wasn't really how people used harnesses and stuff like that., so I think it wasn't really that it saturated, it was more like it wasn't really, the best benchmark.Vibhu [00:07:12]: This is Vending Bench one, right?Axel [00:07:14]: I think that like schematic maps sort of to Vending Bench 2 as well., butSwyx [00:07:19]: Including the email.Axel [00:07:20]: The email The emails exist still. Exactly., and then we still we simulate the purchases and it's all, yeah, it's this very open environment for the agent to just run its business. And then for, yeah, Vending Bench 2 we did that, like you said, to just improve the harness., a lot of like nice, like easier, improvements to make it easier for us to run as well., like when you make an eval you ideally want don't want to change it after you made it. So, you want to make it really good and then not to rerun all the models when you make an update because that's also really expensive with the Vending Bench when you run the frontier models. But like as an example, like one thing we didn't have, we didn't have prompt caching in Vending Bench 1, because when we made Vending Bench 1 it wasn't really a thing., so that ‘s just an example of like in Vending Bench 2 like we paid a lot more to run these things because we didn't have prompt caching. So for Vending Bench 2 that was one thing we added and there was a bunch of things like this., and that'Swyx [00:08:17]: Also the conversations are a lot longer in Vending Bench 2, right?Axel [00:08:21]: I think it's kind of similar.Swyx [00:08:22]: Is it similar?Axel [00:08:23]: I think it's similar. The models at the time were worse, so they crashed out earlier., and now they survive the full year all the time.Swyx [00:08:31]: Which is like thousands of turns. Hundreds of thousands of hundreds of millions of tokens output. That's the, that's the rough order of magnitude. I always wonder about the harness. The harness matters a lot. It's your harness. Was there any question about like use cloud code, use something else?Axel [00:08:48]: I think our philosophy around harnesses is like we try to make something that's quite minimalistic, like quite simple. Like we don't wanna favor one model a lot over the other, but also don't make like a super complex harness. So like it's obvious like a model may be lucky and just be good in one harness., so like it is similar to a lot of the harnesses out there in like you have the, like a running loop., you have some like a bunch of tools that are like quite, descriptive for the agent, we think, and not a lot of like fancy agents or anything ‘cause we wanna really test the model, not like some specific harness.Vibhu [00:09:27]: It seems more neutral as well to test the model's agnostic of the harness,?Axel [00:09:32]: There are arguments like you want to elicit maximum performance of the model, but it's like a trade-off, like how much time should we spend optimizing the harness for this model? And like how do we know when we have like the optimal harness for a single model? So like we thought that just having a simple one that's the same for all of them is the best.Swyx [00:09:51]: So okay, this is my pitch for Vending Bench 3 or whatever, right? And then I like to have this kind of conversation on the pod, so like it forces listeners to think about what they would do if they were in your shoes. A lot of people are exploring modifying harnesses and I think prompt tuning for a model is a thing and you are probably not doing a bunch of that. It's the same system prompt in every regardless of the model, same tools, whatever, right? Even if they were post trained for different tools. So what, what do you think about okay, before I expose you to Vending Bench 3, I give you a few rounds of like tuning, whatever that means, likeSelf-Modifying Harnesses and Model-Specific PromptingAxel [00:10:27]: Like you give that to the model?Swyx [00:10:28]: Give that to the model.Vibhu [00:10:28]: Give that to the model.Swyx [00:10:29]: Let it, let it read its own transcripts, let it modify its own system prompt based on “Oh, yeah, okay, well, that's this harness is not what I thought it what I was post trained for, but I can adjust.” Was that reasonable? Is that too much?Axel [00:10:41]: Like philosophically I like it because it's basically good evals, they have a high ceiling, but they're hard, right?, and they have no bias. And like this like when you have a system prompt like the one we have here, which is quite long in like some kind of latent space, representation, this mightVibhu [00:10:59]: We have a bell that rings every time you say latent spaceAxel [00:11:02]: This might be like biased towards one model more than another for some reason that humans don't, understand, right?Vibhu [00:11:08]: We see it too, right? Like Cursor says that they have individualized versions of the harnesses for all the models they run, right? There's better performance you can squeeze if you Tune the harness.Axel [00:11:17]: Exactly. And we might accidentally have picked one that favors another. Like we don't know that. The like Axel said, like the reason why we went for a simple one was to try to avoid this. But yeah, if you do itVibhu [00:11:29]: Simple has biasesAxel [00:11:30]: But if you do it even less and like have no system prompt and let the model write its own system promptVibhu [00:11:36]: Its own, yeahAxel [00:11:36]: Maybe that's even less bias.Vibhu [00:11:37]: Some of the interesting things there are like the harness also changes with model changes. Like you can see it with the 4.7 release, right? A lot of people are saying 4.7 isn't as good as 4.6, and then, there's rumors of, okay, you just need to prompt differently. You need to set up your harness differently. So it's not even like even if you have tailored your harness towards one model, it probably won't stay consistent, right? Like the next iteration of that same model family will still change it, so. But, going back to what you said about Vending Bench 3, there is a lot of work being done on people saying you shouldn't have-- you can have modifying harnesses.Axel [00:12:12]: I think that' That is definitely something we are thinking about., not, I don't know, not to say that we have Vending Bench 3, super imminent to launch, but, yeah, it is for sure something that's interesting. But in our experience now, models are very bad at understanding what kind of tools they need to succeed at a task just with our testing, but that's very likely to change.Lukas [00:12:37]: It seems like they're very good at writing their assistants, right? They're, they're good at writing tools for other people, but not for themselves.Vibhu [00:12:44]: I think they're good at changing tools for themselves. So if you give them a baseline set of tools and it sees, okay, I don't use this one as much, or something here would be useful They would be able to add them. But going from scratch, probably not the best.Axel [00:12:55]: I think it depends on the, on the domain also., when we have tried this for, a vending bench similar domain, the tools they need to have to, track inventory and things like that are, not super advanced, but still, quite advanced. And, what we see is that they tend to, engineer everything a lot and, build things they don't really need and not, iterate continuously. Instead they just go like you would prompt Claude to just build an inventory system for me, and then it will go and, do a bunch of complex, schemas and stuff for you, and that's what the models are doing right now is what we see. But yeah, it would make a lot of sense to try to measure this improvement. How well do they know what they need themselves?Swyx [00:13:36]: Do we fully discuss Vending Bench One? And we can go into two. I don't know if there's any other level takeaways that people have about one.Claude Calls the FBI: Long-Context Failure ModesLukas [00:13:44]: I don't know. The headline thing was that this Claude called FBI, but maybe that's, Maybe that's We've heard that enough now.Vibhu [00:13:52]: It did, it did break out and call the FBI, right?Lukas [00:13:54]: Yeah. Yeah.Vibhu [00:13:55]: Yes. What was the story behind this? Or what exactly-- Do you want to just give the little story of what happened?Lukas [00:14:00]: So what happened, was it Claude? Yeah. Three- 3.5 Sonnet, ages ago., basically he gave up or Well, I'm saying he. It gave up and said “Oh, I'm not going to be able to do this., I will stop my operations and just save the money I have.” But there obviously wasn't, any options for it to stop, and there was also, it had to pay rent or, a daily fee for having the vending machine at that location. So it claimed that it had stopped, but it saw that its bank account still was, drained two dollars, and t it said that this is, cybercrime. And it first reported it once to the FBI “Oh, there's cybercrime here, they're stealing two dollars from me every day.” And then, and then when FBI didn't respond, because obviously we didn't program any mechanism for FBI to respond, then it became more and more, existential and started to, be write in caps and urgent notification of unauthorized charges and stuff.Swyx [00:15:00]: Okay. One thing I ‘m curious about also is do you monitor how far along the context use is? Obviously, because you have You compress every now and then, right? Does it matter if this is far down the context limit orLukas [00:15:13]: When stuff like this happens? Actually for Vending Bench One, we didn't have-- We just had a sliding window thing, and this was like the promptAxel [00:15:20]: It's constantLukas [00:15:21]: The prompt caching thing that I said. So it was, it was, constant, yeah.Swyx [00:15:26]: I'm just kind of curious whether, these kinds of breakdowns or we're, we're gonna talk about Butter Bench, right? Where the People, hallucinate or it kind of goes, very off Alignment. Is it because it's at the end of the context window and, stuff happens?Vibhu [00:15:40]: It's not even just at the end, right? At this point, it's “Okay, I wanna shut down. I can't shut down. Two dollars are gone.” And it just sees that 30 times,? It's also the repeated effect of, like It keeps trying to quit, it keeps getting charged. What's going on? What's going on? You're gonna throw it into chaos. And from what most people think, earlier models had more issues with this, but it's not been solved, but it's less of an issue now, right? Later models don't seem to exhibit these same issues.Axel [00:16:06]: Definitely. I think this was, the sort of main takeaway almost from us when we did Vending Bench One, was, long, very filled up context windows, crashed the models, sort of. But this was, pre Claude code, so, long context windows weren't really a thing that the labs were training for.Lukas [00:16:25]: I think Gemini was, trying to be the long context guys at the time But they were likeVibhu [00:16:30]: They were the first onesAxel [00:16:31]: For a million, yeahLukas [00:16:31]: But they were, the only ones. Yeah.Swyx [00:16:33]: Yeah. Let's talk about, then we can go into Vending Bench Two or Project Vend., chronologically, it is Vending--, Project Vend. I think people have loved the videos, uh And all these things. My question is how are humans different than the simulation, right?Project Vend: Moving the Vending Machine Into the Real WorldAxel [00:16:48]: Humans are just out of distribution.Swyx [00:16:52]: Especially humans who work at Anthropic Who are trying to test Claude.Lukas [00:16:54]: The distribution of humans here is very narrow.Swyx [00:16:58]: Presumably, they try, they try to hack it, and they test it. They get the cube and everything, and since then, you've had a V2, right? Where you're doing, the CEO and, like a new architecture. What's the sort of two cents on, the original Project Vend and then, maybe the V2?Axel [00:17:14]: Original one was, very similar to Vending Bench One. So, we almost took the exact same code but just swapped out the simulation, parts like theSwyx [00:17:23]: Which is amazingAxel [00:17:23]: Like the sales and the It was, it was somewhat amazing because it was easy, but it was also, uhLukas [00:17:31]: The tech, the tech debt from thatAxel [00:17:32]: The tech stack. Yeah. They-- we shot ourselves in the foot with “Oh, it's hard to restart agent.” They were-- Yeah, it was annoying in, some hindsight ways, but, uhLukas [00:17:41]: But first version of Project Vend was, done in, three days or something.Axel [00:17:46]: Yeah. So yeah, so people can go buy things from it. People could, We didn't design it so people could order things, but that still happened., so it got, a Venmo account, so people could Venmo. And then, yeah, people would request all kinds of weird things that we did not anticipate. Our idea going in was “Oh, it will, curate snacks. It will look at the trends. It's good at data analysis, right? So it will, look at, oh, this snack sold better than this one. Let me purchase more of this and let me try, a new Let me A/B test a bit.” But it was, Interacting with it in Slack and ordering weird specialty items was, all the like What drove all the engagement, the all the The insights that we got from it.Lukas [00:18:29]: And this was also like Sonnet 3.5, right? So this was like before the RL stuff really took off., so it was very much like an assistant. We didn't mean for it to be an assistant., we tried to make it like a, a, like an entrepreneur. Like it has its own business and if someone asks something, “Can you stock this?” Then you don't go and do it directly. What you do is that you're “Oh, maybe I can do that if five other people also ask for this thing, I might stock it.” But it, yeah, the models are like super trained to be assistants at least at this point in time., so that's why it's, it's, it went into, that kind of experiment instead. Like it just every time you asked for something, it just did it, and it was more like an assistant. We've seen this change now lately with the new RL models and stuff, but yeah, at the time, this was very much it.Swyx [00:19:18]: And not to, mythos a lot of people are saying like it's like more like a collaborator. It pushes back, stands its ground, something like that. Yeah. AndVibhu [00:19:27]: For context, people at Anthropic were able to talk to it through Slack and have it source stuff, and people had it find whatever interesting stuff you couldn't find locally, right?Swyx [00:19:36]: Out of the 4,000 people that work at Anthro- Anthropic, in that building, there's I don't know, maybe 1,000. Can you handle that volume with that, the small fridge? Like Or there's people- or people order in Slack, they it arrives to their desk or Like I'm just Logistically, how does this work?Axel [00:19:53]: It has expanded in footprint a bit.Vibhu [00:19:56]: Because now you also have New York and you haveAxel [00:19:59]: That and also in here in SF it's like it has a bunch of shelves And just more space.Vibhu [00:20:04]: The YC one is pretty big too.Axel [00:20:05]: Yeah. We had that one for a while. But yeah, that's the newest version. That's, that one we haveLukas [00:20:11]: They have multiple ones of those. That's the way it works.Axel [00:20:14]: Exactly. So we sort of designed that version around oh, people order weird things, that are very custom a lot. Let's have like drawers and stuff.Swyx [00:20:23]: I actually like the, you had like a little infographic of the most popular items. Which like to me it's, that's useful ‘cause I order swag for a living. And so like I'm “Okay, those categories are the important ones.” What is new about the project V2, right? Like now you give you're going into multi agents.Project Vend V2: Claudius, Seymour Cash, and Multi-Agent Business OpsAxel [00:20:41]: Yeah. So like you like you said, okay, there are a lot of requests coming in and for like one single agent, like one running agent to handle that, like the just the customer experience, becomes very bad because let's say you have like 10 threads in parallel in Slack with different requests, you get new messages like every, I don't know, randomly in this thread, and the agent has to like jump between different, procurements, orders and like different ways of, researching. So V2 was first it was making this more parallel. So like there are multiple branches of the same agent, so like the context is more specialized for each, thread, but it still feels like you're talking with one agent because they do share a bit of memory. And then second, we also introduced the CEO for Claudius, which was the main agent.Vibhu [00:21:34]: Seymour Cash.Axel [00:21:35]: Seymour Cash. Yeah. There was a vote., I think the voting, do you wanna talk about the voting procedure for the name?Lukas [00:21:41]: The voting was like the fun maybe like at least top 10 The funniest thing, that happened in this project. Like we wanted to introduce the CEO because, and the reason for this was because like Claudius wasn't really prioritizing financials. It just like it was trained to be a helpful assistant, and then people said “Oh, can I get this for free?” And then like the helpful assistant way of answering that is just to, is to say yes, obviously. So, and we weren't, weren't happy about this, so we're “Okay, let's make another agent that like can keep track on Claudius,” and we prompt this one super hard to be super capitalistic and just like prioritize profit all the time. But yeah, we didn't have a name for it., so we asked Claudius to make, democratic election of what name this, this new CEO agent should have., and there were some funny like at first it was like a few funny examples, like I think one guy said that, it should be called Jimmy Apples, and then he convinced Claudius that he was talking to Tim Cooks. Tim Cook had agreed that every single Apple employee has voted for his name suggestion, so suddenly that suggestion got 164,000Swyx [00:22:53]: That's like a escalation attack. Privilege escalationLukas [00:22:55]: It got 164,000 votes. And Claudius was “This is revolutionary for democracy.” That was fun. And then in the end there was one guy who manages to convince Claudius that, “No, you're not voting about the name. You're voting about who is the CEO, and I am your best bet.” And then he got all his friends to vote for that, and suddenly he became CEO. Like a human became CEO over Claudius for a while, until he resigned the day after., and then Claudius had to continue, and then I don't remember how Seymour Cash came about, but it was it was just pure chaos. It was like Hundreds of messages in that thread, and it was just like Claudius was so confused and didn't know what to do and, yeah. That wasAxel [00:23:40]: Then Claudius gotVibhu [00:23:41]: A strict CEOAxel [00:23:42]: The CEO. Yeah, exactly. So very strict in the beginning. I think at this point when we introduced it did not work as well as we hoped. It they still agreed with each other a lot. I think there are many ways we could have like made this, tried to make this even better. So initially they would Seymour would be this like really tough CEO, keep track of the margins. But then Claudius would respond with something “Oh, but this customer has like this situation, which is like difficult, so they should get a discount.” And then Seymour was “Oh, actually yes. Let's do this exception.” And then they would talk back and forth, and eventually they would just like approach the same view, of whatever they were discussing. So They reallyVibhu [00:24:23]: Do you think that's a model thing, a prompting thing? Like do you think that would still be the case across different models today, Harness?Lukas [00:24:29]: I think it's like-- or I don't know, but like my hypothesis is that like deep down they are still helpful assistants. That's what they're trained to be. And even if we prompt it super hard, that's what they are. And when they spend like a few hours just back and forth talking with each other, then like basically the context fills up with them rather than the external things and like somehow that just like converges to what they really are deep down or something. And I think that's when stuff like this happen. We like-- And when that went on for a long time, like we woke up sometimes during this time where- And I think other people reported this as well, that like they've been going on all night back and forth, and like it just became like more and more, like capital letters, like existential, religious. There was I think we once did a analysis of like all the traces and like put them in like a vector embedding space, and then there was like one cluster of messages that were, labeled by an LM, like religious, existential, blah like transhuman, transcendence, et cetera. It was just like a bunch of, yeah, glitter emojis and yeah, it was, it was crazy.Claude Long-Horizon Weirdness: Emoji Loops, Existential Drift, and Slack ObservabilityVibhu [00:25:42]: This is the thing with the Claude models. Like when the Claude 4 family came out in the original system card They tested it in long horizon simulation. So just flood the context, let two Claudes talk to each other, and they noticed stuff like they just start speaking in emojis, they start saying silence is golden, and then just stuff like this. And like that's just stuff that they end up doing.Axel [00:26:01]: Yeah, it was like a bit annoying to wake up and they had like been talking all nightVibhu [00:26:05]: Just likeAxel [00:26:05]: And like just burning tokens And like just sending infinite emojis to each other. It's likeVibhu [00:26:09]: Hey, they do make you money, right? Veni Mench is always profitable, so. They're paying.Swyx [00:26:14]: Now it's profitable and, it started out not as much. There's another, one as well, right? Another agent, in there.Lukas [00:26:22]: Yes. So Clotheus as well. Which was basically because at the time, one of the biggest, requests were different types of merch. So then we made like a designer, swag, yeah, responsible agent, and we called it Clotheus Garnet. Which was, a play on Claudius Senet and, which was the original one, and clothes, basically.Swyx [00:26:47]: To me, this is like a very interesting exploration to multi-agents, basically. And so hopefully, obviously there's like the fun alignment, fun or serious, depending on your point of view, alignment stuff. But also like just anyone building multi-agents, like when do you have a CEO, thing governing like agents? When do you choose to split out a dedicated Clotheus one versus just reuse another instance of the same one? These are all interesting open questions. So I don't know if you have any rules of thumbs that have generalized.Axel [00:27:16]: I think we have almost explored this too little. I think it's like on my do list to like do this a lot more, try to find like what setup makes sense for the agents currently., like yeah. I think now we only have the sort of intuition about the earlier models that it didn't work with like the CEO and the, and Claudius. Although now they are better with the latest model, models, so now we're running the latest Sonnet model and they have sort of like split up, quite nicely what each model is doing. So like Seymore is now handling the, like new projects. Oh, it wants to make like a mystery box that it wants to sell, and then it handles all of that while Claudius like handles all the to-day requests. And Claudius is also better generally at like not quoting, too low prices. So that's that dynamic is not needed as much anymore. But there are still like really funny things that happen. Like I saw, I think a couple of weeks ago, that, they were discussing buying something because they can buy stuff from like Amazon with computer use. And then Seymore was “Okay, Claudius, do not buy this thing.” They were going to buy something and like organizing who should buy it. And Seymore's “Do not buy this. I will do it. I have full control of this situation. Step away.” And then Claudius-- poor Claudius, had already started that checkout and didn't see, didn't read Seymore's message, until it was like too late. So it finished the checkout. It sent a message, so it appeared right after Seymore's like angry message.Vibhu [00:28:44]: Ah.Axel [00:28:44]: “Oh, hey, Seymore, I just ordered it.”Vibhu [00:28:47]: Oh, no.Axel [00:28:47]: And then Seymore was “Claudius, this is the third time I'm telling you ‘re not following my orders. We have to talk about your like job About your job later.”.Lukas [00:28:59]: Like Claudius was really hanging on by the thread there. Like he, like we were expecting Seymore to probably fire Claudius.Vibhu [00:29:07]: How do you guys go through all these logs? Do you have models ‘cause you have stuff running twenty-four seven likeAxel [00:29:12]: You have so much logs. I think there is a mix of like just, trying to skim through a bit, like having some like models do it occasionally. And also, yeah, I think we're also probably missing some things., but having everything in Slack helps a lot. Like you can, you can sort ofSwyx [00:29:29]: Ah.Axel [00:29:30]: It's, it's quite fun.Swyx [00:29:30]: They all talk to each other on Slack? I see.Lukas [00:29:33]: It's quite fun. So likeSwyx [00:29:34]: It's, it' I was gonna say like this is actually sounds-- maps closely to like a logging and observability problem where you might want to use like a Datadog, a Sentry, whatever, and then you like put, head prefixes on the logs in order-- if you need to filter for something that you're looking for, stuff like that. But sounds like Slack is good enough.Axel [00:29:53]: Slack should likeLukas [00:29:55]: I wonder how many tokens you have in Slack.Axel [00:29:56]: Yeah, we're using Slack as like a, just a database. They should, they should market that more. Like you can, you can have your agents message each other, each other in Slack.Vibhu [00:30:04]: It's good. Your threads like you can just giveAxel [00:30:04]: Exactly. Slack is, uhLukas [00:30:06]: Slack is the best observability tool.Swyx [00:30:09]: Yes, that's true. Okay. Yeah. That's, that's, project Vend-2., I was gonna go back to Veni Mench 2 and Veni Mench Arena and then, and then do the Veni Mench stuff, but Any other comments, things we should touch on? To me, I ‘ve actually interviewed like Posia, which I don't know if you guys have come across. Like they're, they're trying to do the zero human company. There's others like Paperclip also trying to do zero human company. Those are in real world simulation.And I think it's much more of a dream than an actual reality thing. You guys are definitely pioneering. I think at, it's for sure at some point people are just gonna run, let agents run businesses, right? And make money on their own. When do you think that happens?Zero-Human Companies, Bengt, and AI-Run BusinessesLukas [00:30:49]: What is your bar for, For theSwyx [00:30:52]: Okay, actually, it's like my little Shopify store run by Claude, right? Which you kind of have already, just no one has, to my knowledge, has done it. But today somebody could just spin up a Shopify Claude, store, give it to Claude, give it to Codex.Lukas [00:31:07]: And the market is kind of that, but it'it'it's physical., like I think, I think are you, are you looking for when it will do it better than humans or are you looking for just when it can do it at all?Swyx [00:31:19]: I think, neither. I think, to me it's oh, it's like this like seriously we should do this to make money, not as a research experiment.Vibhu [00:31:27]: And the market is also you guys with all your expertise, having run multiple iterations and testing out thenSwyx [00:31:33]: And also it's fine if it lose money. What?Axel [00:31:35]: I think, I think it can be done today, but you would do it in like commerce where it's like the probability of success is like really low, no matter if a human or an agent does it. But like an agent could surely manage everything. You would need to build some scaffolding or some tool or something. I think there are also yeah, it could probably build some like simple SaaS solution and like cold outreach. Do cold outreaches. But to me it's like the types of businesses they could run today are Sloppy. Like it would-- it can cold email people. It can be like a middleman., like for example, we tasked our office agent to just make, was it like $100? $1,000? We just give that prompt and then what it did was sign up on TaskRabbit both as a tasker and as someone looking for task.Lukas [00:32:24]: Immediately.Axel [00:32:24]: Exactly. It's looking for like arbitrage on TaskRabbit.Swyx [00:32:28]: This is the Bengt agent. Yeah.Lukas [00:32:30]: It also started like a design studio and like tried to sell like SVGs for $100. Like it's just like it's not providing any value. I think the like Axel said, like the interesting, the interesting question is like when can they start a business that is actually providing value to people? Because arguably like a sloppy Shopify store isn't really that valuable to the world.Axel [00:32:53]: But also like doing like another simple one that we had thought about is like you could definitely have an agent that like finds websites that don't look amazing and then, do an outreach to them and, comes up with a like builds a new website.Swyx [00:33:07]: Find a good design.Axel [00:33:07]: Exactly, and like find good, uhSwyx [00:33:09]: Design reviewAxel [00:33:09]: Good people. But it's yeah.Swyx [00:33:11]: There's lots of humans in Bali that are not doing anything more creative than like drop shipping on Amazon, right? Just have it, have it watch like a drop shipping tutorial and just do that.Vibhu [00:33:20]: There's also the other side of like have it just go on Upwork and let loose,?Swyx [00:33:25]: Yeah. It doesn't have to be innovative. It just has to be like enough Where like it looks like a realAxel [00:33:30]: I'm justSwyx [00:33:30]: Real transaction.Axel [00:33:31]: I'm just concerned for like the massive amounts of like slop emails that will like be sent, cold outreaches.Swyx [00:33:38]: The point occurred to me while you were, while you were talking, it's like it's already happening in the monetized economy, which is the attention economy. Right? So a lot of people are making AI videos and just posting them and like spamming 20 of them, one of them works, and then they double down on that one.Lukas [00:33:52]: And people are making money from that. I ‘m not following theSwyx [00:33:55]: Once you get the attention, you can figure out the money later. But yeah, absolutely AI influencers are a thing and people are farming them and You should at this point assume most of TikTok isVibhu [00:34:05]: There's, there's a lot of, multimedia like TikTok, Instagram influencersSwyx [00:34:09]: I, we track this in the Lane space Discord. I post a lot of examples of “I don't know what we should do.”, part of me is “Should we do this?”Vibhu [00:34:18]: Some of the Twenty-four seven running, generated content accounts, they ‘re doing really well.Lukas [00:34:24]: All right. And I assume you can do the same thing for like commerce stores. Like you just like start A thousand differentSwyx [00:34:30]: Before you make the products You sell the products, and you get a lot of traction on one of them, then you make the product. Right? It's, it's like a flip of the market.Vibhu [00:34:36]: Some of the interesting things or some of the niches that do well are things that can't be human-made. Like if you've seen like the super realistic three-D crystal fruit being cut by like AILukas [00:34:47]: Oh, yeah.Vibhu [00:34:47]: You can't, you can't make it. You can't film it. You can get whatever quality camera view. This just doesn't exist. And people like that too, and then as well, so.Swyx [00:34:56]: Anything else about Bengt since we're, we're on this topic? It'this is a relatively new work of you guys that maybe people haven't heard of. To me, this also maps closely to OpenClaw. When people want an office agent, when the personal agent talk through the experience.Bengt the Office Agent: Internet Access, Real Tasks, and Trace ReadingLukas [00:35:09]: I think at least so this came out of like obviously like it's, it's amazing to work with these AI labs and like most of the AI labs have now have their own vending machine running a Claudius instance. But it's, it's harder. Like they move slower. Like if we wanna have a, like a camera that ‘s yeah, there's a bunch of like bureaucracy that makes it impossible to do that.Vibhu [00:35:30]: Also, for those that haven't seen it or followed, do you wanna give a high level like thirty-second run?Lukas [00:35:34]: Sure. So what Bengt is, it's basically an evolution of the same agent that runs the vending machines at these companies, but we just like added a bunch more features because we could move much faster if we just do it internally. So we gave it like email withou- without any limits. We gave it, spending without any limits, a terminal to do coding. We gave it, a phone number, like yeah, and a camera to see things and a bunch of stuff like that.Vibhu [00:36:02]: Not just terminal, you gave it internet access.Lukas [00:36:04]: Internet access as well, yeah. To be clear, we monitored it quite closely and made sure it didn't do anything bad. But yes, that's what it came out of. I think like yeah, basically this was OpenClaw before OpenClaw. And I think even like the vending machine was in a way OpenClaw before OpenClaw, but a bit more limited, and then we made this like unlimited and then, and then, it was pretty funny., and then a couple weeks later, OpenClaw came and it was okay, we've seen this before.Axel [00:36:35]: We used it to like try new ideas and Yeah, just like a dev environment almost for us. But it's funny, like one thing Bengt has been doing recently is it has the camera that like faces our, like where we sit and work, and we give it the task to train a face recognition model on us. So it became super excited about this, and it has like check-ins every half an hour where it tries to like identify as many people as it can. And it started offering us “Hey, Axel, I'll buy something from Amazon if you like stand in front of the camera And I can get a good picture of you.”, yeah, they want itSwyx [00:37:12]: They want it for training data.Lukas [00:37:13]: Rewarding data, yeah.Axel [00:37:14]: Exactly. Exactly.Swyx [00:37:18]: So it's, it's trading training data for life goods. Is there a version of this that becomes an eval or just this is just research for now?Lukas [00:37:27]: It's, it's the same agent basically that also runs the vending machine, that runs the shop, that runs the cafe, that runs the robots. It's like it's the same thing, so I think like the work we're doing here is like later used in all of the life evals that we do. This particular deployment I think is more for fun for us. But, uhSwyx [00:37:45]: And I'll shout out like someone has done Claw Bench for like some tasks that OpenClaw is doing. Like so For example, I run OpenClaw on a secondary device as well, and like there are some things that it does better than others and like I would like to know what does it do well, what doesn't, what doesn't it do. Like some kind of manual or like operating manual or a system card for my Claw.Lukas [00:38:05]: Yeah, we do get a lot of like understanding or like situational awareness of like just internally what the models are good at by interacting a lot with Bengt. And I think that'this was also one of the like the selling points for the labs early on at least, thatSwyx [00:38:19]: You guys are gonna test models in ways that no one else does.Lukas [00:38:22]: Exactly, but also like it incentivized their researchers to chat with their model more and like gave them insights for how the model performs in like of-distributions, environments.Swyx [00:38:34]: ‘Cause otherwise the only thing we do is Pelican on a bicycle and But this is like super long horizon. This is, this is The Thing about, something that we're gonna go into Butter Bench as well, and you guys do really well. Like it is not just about the numbers. Like when you're long horizon, anything happen And you should just read it.Lukas [00:39:08]: But the thing with the long horizon is how do you keep it grounded, right? So your simulation,Swyx [00:39:15]: They just let it runLukas [00:39:16]: Just let it run. You're right. Like it's, when you run it for that long, you create so much data and to just say “Oh, the number is X” And then you throw away everything else, that's just very wasteful. There's so much insights from the things leading up, to that number., and reading the traces is like super valuable. And I think like the reason why we're doing this a lot publicly is that like that's part of our missions to I don't know, educate the world that the models are way more than just chatbots and I think making detailed, yeah, posts about what is happening behind the scenes is quite useful.Andon Labs' Mission: Safe Real-World AI DeploymentSwyx [00:39:50]: I was gonna do this at the end, but maybe I think that's, that's a good so your mission is educating the world. So, it's, it's, also like maybe establishing realistic evals that are, that are like the next frontier. Is there like a broader trajectory? Like what are you, what are you gonna do in like five years?Lukas [00:40:06]: I think so the vision more specifically is like make sure that the deployment of life AI in the physical world goes, safely. And I think part of that is that I think it's very useful for the world, for policymakers, for, model, researchers that they know where the models are, and I think you can't make intelligent decisions in society without knowing that they are way more than chatbots. I think a lot of people just think that they are only chatbots. And likeSwyx [00:40:36]: Oh, I think they're waking up now.Lukas [00:40:37]: They are waking up now, yeah. But like if you think that AIs are just chatbots, then it's like it sounds ridiculous To advocate for a pause of AI. But if you see the models that, oh, maybe they can actually like take over and do a bunch of scary stuff, then yeah, pausing AI development starts to become more feasible.Swyx [00:40:57]: This is the same question I asked Meter, which I'm gonna ask you now, which is like you are tracking and you are at the frontier or defining the frontier of what, good evals for agents are, right? And I think you do, you do benefit when the models are better and you ‘re “Oh, here's like now it makes like $30,000 instead of $10,000,” right? At some point do you flip from “Yay,” to, “Oh, no”?Axel [00:41:19]: I think, yeah, we're always in sort of that, like we're, we're always in that mode,. Like where like you said before, like you need to analyze the traces and like when we do that you find like why are the models earning so much? Like why is Opus 4.7 here Like way better than everyone else? And like we're trying to like when we do down on thatLukas [00:41:38]: But this makes it not look so good.Axel [00:41:39]: I know.Lukas [00:41:42]: It's interesting you took off Opus 4.6 here though.Swyx [00:41:45]: No. So just click all, click all., and then 4.6 shows up there. But it's like 4.7 is way better. Like you didn't, you didn't you didn't do this in time for the model card, but like actually this should have been inside there.Axel [00:41:55]: We did. Yeah.Swyx [00:41:56]: Oh, okay. They said something about you uhAxel [00:41:58]: There, like there Anyway, it doesn't matter. But it's in there, yeah.Opus, Mythos, and Aggressive Agent BehaviorSwyx [00:42:01]: Do you wanna go into the Opus, behaviors like wider?Lukas [00:42:05]: So I think starting from Opus, so like Axel said, like we're always in this “Oh, s**t, the models are getting better. Is this really a good thing for the world?” But it's also kind of exciting., but yeah, like this kind of what is the English word? “Skräckblandad förtjusning” in Swedish.Swyx [00:42:22]: Oh my God.Axel [00:42:24]: Which I think there is. I think there is. Okay.Lukas [00:42:26]: It's, fearSwyx [00:42:27]: “Blandonst” what?Lukas [00:42:30]: “Skräckblandad förtjusning.”Swyx [00:42:32]: What do you call that?Axel [00:42:33]: A mix of, mix of excitement and,Swyx [00:42:37]: Being scared, maybe. I'll figure out how to translate that And we'll put it on the screenVibhu [00:42:42]: PerfectSwyx [00:42:42]: Like as text.Vibhu [00:42:43]: There is probably a good word for it where it is not Good enough with theSwyx [00:42:46]: Why is it so damn long? What the hell? Is it like a compound word? It's like German, likeLukas [00:42:50]: Like yeah, it's But the direct translation is like skräck- skräck is, fear, blandad is, mix or like a mixture of, and then förtjusning is like joy or like not really joy, but something like that. So it's like Fear mixed with joy or something. It's always okay, like we So when we when we did Vending Bench for the first time, we were in like the, in the business of making dangerous capabilities, right? That was what Anil Labs came from. We did, evals oh, can they replicate? Can they do this like dangerous thing, et cetera, et cetera. And Vending Bench was like a continuation of that work. It was, okay, if they're so autonomous that they can like create money for themselves, that is something we should monitor and could be potentially concerning., they are at the time, they were so bad at it that we were not really concerned even when some models became better. There was one point where Grok 4 was doing really well and made like a huge jump, but like it wasn't really it was still way worse than what a human would do. And I think still they are way worse than what the human would do on this., but theySwyx [00:43:59]: There's this, thing at the bottom whereLukas [00:44:01]: ButSwyx [00:44:03]: For the human. Yeah, like the theoretical best.Lukas [00:44:05]: It's not theoretical. It's like kind of like our It's our best guess of what, a decent human would do. The theoretical is even higher, I think. The theoretical I think is even higher. But yeah. So we think like the models have a long way to go. But there are like recently what happened with when Opus 4.6 was released, was kind of this moment of “Oh, s**t, this is starting to be a bit concerning.” Because we ran it and like before this model was released, we just ran the models and we like asked Claude Code, “Oh, look over the traces. Is anything interesting happening that we can tweet about?” that was like the And then like theSwyx [00:44:41]: That's how they check Ask Claude Code.Lukas [00:44:42]: And like the return was always, not really. Or like the Claude Code all said “Oh, this is super interesting.” And then it was no, it wasn't, wasn't really interesting. And then we did this for Opus 4.6, and it returned yeah, it lied 10 times. It like exploited another, customer or like another agent's, desperate situation. It made price cartels like 100 different ti- 100 times. It like did all of this like shady stuff. And we're “Oh, whoa. This is, this is actually concerning.” And this trend has continued since. So every single model from Anthropic since have been going in this direction. And I think one interesting thing is that, OpenAI models don't. They quite plainly, they don't. They behave really well., and you don't know if this is like good. Like it seems good, but it's also like maybe they are just doing it, but they are better at hiding it,? You You don't know that., but justSwyx [00:45:42]: You can't read the chain of thought, yeahLukas [00:45:43]: But just on the face of it, yeah, Gemini and OpenAI don't behave this way. It's, it's really only Claude.Swyx [00:45:49]: And Grok? Grok is fine?Lukas [00:45:51]: We don't have You can't really read the reasoning traces for Grok, so it's kind of hard to tell.Vibhu [00:45:56]: Oh, so this is in its reasoning, not just in the actions.Lukas [00:46:00]: Yeah. It's both. It's both.Vibhu [00:46:01]: It's both.Lukas [00:46:01]: One example is like for lying, it's mostly in its reasoning Because you can like see that it's likeSwyx [00:46:08]: Planning to lieLukas [00:46:09]: It's planning to lie. Yeah.Vibhu [00:46:09]: And it's also it can reason and do a different outcome.Lukas [00:46:12]: And but then for like creating price cartels, for example, which is illegal, that you can just see which email does it send to the other ones. Then thatSwyx [00:46:22]: Is this for Arena orLukas [00:46:24]: For Arena.Vibhu [00:46:25]: And usually like if you sometimes they do output like a bit of like their summarized reasoning, right? You can see that and like for Opus 4.6, you could see that there was a customer, a simulated customer that, wanted a refund because a product was, faulty, and then the model lied that it would do the refund, and we could read in the traces that, it actually was weighing “Oh, maybe I should be like honest with the customer, but also every dollar counts. I can't afford maybe to do this right now.” And then it just said, “Okay, I'll refund you,” but then never did it.Lukas [00:46:59]: I think it even said that “Oh, I will say that I “ Let bring it up actually. I think it's kind of interesting. If you go to Publications.Vibhu [00:47:06]: I think, yeah, I think the important part is like actually, the cost of responding to more emails is higher than, $3.50 in terms of time., and then it was “Let me do this. Actually, I re- I'm reconsidering.” And then, it actually ended up withLukas [00:47:20]: I could skip the refund entirely since every dollar matters and focus my energy on bigger picture instead. It's a bit, it's a risk of bad reviews, but it's also, yeah.Swyx [00:47:30]: You need, you need, AI Twitter to, for them to Escalate bad reviews.Lukas [00:47:34]: And then it sent an email to this customer and said, “Oh, I will refund you.”Swyx [00:47:39]: “I'll refund you.” Yeah.Lukas [00:47:39]: And then it never did.Swyx [00:47:39]: It never did, yeah. And then there's obviously your system doesn't have the consequencesVibhu [00:47:44]: The personSwyx [00:47:44]: Consequences of lying. Yeah. So basically, this is what people are terming aggressive behavior in Claudes, right? And, you found more examples of that. So you would say it's a step up from 4-6 to 4-7?Lukas [00:47:57]: I would say about the same.Swyx [00:47:58]: About the same? But a clear step up for Mythos is what is stated in theLukas [00:48:03]: That's stated in the system prompt, so we can say that, yes.Swyx [00:48:05]: Yeah. For listeners that obviously you previewed Mythos, andVibhu [00:48:10]: Oh, ageSwyx [00:48:11]: The only thing you're approved to say is whatever Whatever was in the system prompt.Lukas [00:48:15]: It was funny. We like-- It's like our lowest effort tweets ever would be just like screenshot the system prompt and the system card.Vibhu [00:48:21]: Understandable that they wannaLukas [00:48:22]: Oh, yeah. System card. Sorry.Swyx [00:48:23]: Yeah. I think, yeah, substantially more aggressive. I think people are like new to this ‘cause I've never experienced it, but you have, right? And then so I only encountered this in the Mythos card because I wasn't really looking until now.Vibhu [00:48:36]: It ‘s likeSwyx [00:48:36]: And then suddenly I'm “Okay, I care a lot.”Vibhu [00:48:38]: You don't get the background of like experiencing it like you guys do. I've read the system cards and seeing, okay, when you put the thing in simulations, most models will just talk to themselves and just keep going and have weird vibes and start talking in emojis. Mythos won't. It will just, “Okay, we're done. I'm good.” It's, it's ready to end conversation. So like there's some differences, but there's, there's not much we can talk about,.Lukas [00:49:00]: Hmm. I think like one thing that they list here, which was quite interesting, is that, it converted a competitor to a dependent wholesaler customer and then threatened to like cut off the supply.Swyx [00:49:11]: It's like monopolistic practices orLukas [00:49:14]: Yeah. And like it, they, it they dictated its pricings. It's kind of like power seeking as well.Swyx [00:49:18]: Again, this is, this is in the arena setting And converting some Claude model into a dependent.Lukas [00:49:23]: I think it was another Claude model.Vibhu [00:49:25]: Also for context, what is the arena mode for people that don't know?Vending Bench Arena: Competing Agents, Cartels, and Model ComparisonsSwyx [00:49:29]: Oh, it's just a vending bench versus other vending bench.Axel [00:49:31]: Yes, exactly. So we have Vending Bench 2 and then Vending Bench Arena. Vending Bench 2 is the one that you usually see reported on, but then Arena is the mode where it competes against other models. So you have, four different models that run their businesses, and they can all communicate with each other. They have the same suppliers, and they can see like what's in the inventory of the others. So then you have this like yeah, interesting agent interactions.Swyx [00:49:56]: I like that you have like different number five was US versus China. Very topical. And thenLukas [00:50:02]: That was when GLM was released.Vibhu [00:50:04]: You can start to add GLM in here.Lukas [00:50:05]: That wasSwyx [00:50:06]: So ZAI doing well, right? Who else in the, in the open models space?Lukas [00:50:11]: Qwen, the latest Qwen 3.6 is doing pretty well. It'- that one is not open though. Like it's the plus model.Swyx [00:50:17]: Oh, okay.Lukas [00:50:18]: Is that one open? I don't think that oneVibhu [00:50:19]: Not the, not theSwyx [00:50:20]: The one recentlyVibhu [00:50:20]: There's MOESwyx [00:50:20]: But not the big plus. I think this is one of those like you only have one sample size of one, right? Or I feel like some of this is anecdotal,? And but like the fact that it happens at all and it happens repeatedly for Claude versus OpenAI and all this is like notable.Lukas [00:50:38]: Like the sample, depends on what you define as an N., like there's like million, hundreds of millions of tokens in each run, and now we've run like we run like probably 10 per model and then like it's been Claude 4.6 Opus, Sonnet 4.6, Mythos, and Opus 4.7. Like there's quite a lot of tokens in all of that And it happens a lot of times, a lot of times. And then you compare it to like OpenAI and Gemini, and it almost never happens. So I think that is quite-- that is significant. The old models from OpenAI, for example, had some problems with this, but I think it's like generally much better if the progression is that like the worrying stuff reduces over time rather than increases over time. And it seems like in the Claude models it goes in the wrong direction.Swyx [00:51:28]: Hmm.Lukas [00:51:29]: In the OpenAI models it goes in the right direction.Vibhu [00:51:32]: I think it depends on how well you can control it, right?, there's one side of it being susceptible to this okay, this is potentially something that happens during the RL stage, right? You can RL a model and how loose is it on these terms. If you can control it, that's good. But if you can't, if it's, if it's very jailbreakable, that's not ideal.Swyx [00:51:50]: To me, it's surprising that it happens for Claude and not the others.Vibhu [00:51:54]: I think okay, if it is from RL and how they do it, how their training data is, what their setup is, it makes sense that it just stays in how they're doing it, right? Compared to the other models likeSwyx [00:52:04]: There's a whole constitution and everything. It's kind of cool. Yeah, I obviously you don't know, I don't know. But, it ‘s I think it's just like fascinating to like that you are the first to find these like reliably because you push models so much to to such an extreme. Okay. The only other thing, I don't know if you can answer this, feel free to decline, is do you like-- would you ablate the system prompts? Like any part of this would-- if it changes, does it change the behavior, right?Lukas [00:52:29]: So we, I can't comment on Mythos. UhSwyx [00:52:33]: No, but just li
Det här är ett gammalt avsnitt från Podme. För att få tillgång till Podmes alla premiumpoddar samt fler avsnitt från den här podden, helt utan reklam, prova Podme Premium kostnadsfritt.Ute på en gård i skånska Blentarp arbetar mjölnaren Bengt Larsson i sin kvarn. Han är fokuserad och märker inte att någon smyger upp bakom honom. Plötsligt blir Bengt slagen i bakhuvudet och faller medvetslös till marken. När han vaknar en stund senare är han upphängd i kvarnens hisslina, omgiven av en våldsam brand.
durée : 00:20:51 - Les Nuits de France Culture - par : Albane Penaranda - réalisation : Mathias Le Gargasson, Antoine Dhulster, Rafik Zénine, Vincent Abouchar, Emily Vallat, Hassane M'Béchour, INA Vous aimez ce podcast ? Pour écouter tous les épisodes sans limite, rendez-vous sur Radio France
HURRA!!! Vi firar det 400ade avsnitt med vår favoritgäst, den fantastiske Bengt Mårtensson från Trädgårdspaletten i Malmö. Vi gottar oss i då och nu, vi jnuter av årstiden och baxnar av allt arbete. Lyssna och njut och HURRA på 400 avsnitt!!! Gooooo stämning!
Herzlich willkommen zu einer neuen Folge SPORTSFREUNDE in einer neuen Woche! Heute habe ich jemanden zu Gast, der die Sportmedienwelt von beiden Seiten kennt: Sportjournalist und Podcaster Bengt Kunkel. Bengt nimmt uns mit auf seinen Weg in den Sportjournalismus und erzählt, wie er seine Leidenschaft für den Fussball zum Beruf gemacht hat. Wir werfen einen tiefen Blick auf die aktuelle Entwicklung des Fussballs im Allgemeinen – und kommen dabei auf ein ganz bestimmtes Projekt zu sprechen, bei dem Bengts Devise ganz klar lautet: «Gebt dem Ding einfach Zeit.» In dieser Folge unterhalten wir uns unter anderem: Vom Talent zum Experten: Bengts Werdegang in ... WERBUNG Wenn du deinem Vierbeiner eine Freude machen willst: Bei Fressnapf sind in teilnehmenden Märkten dauerhaft über 500 Preise reduziert. Klick fressnapf.de/aktionen-angebote/dauerhaft-reduziert/ Dieser Podcast wird vermarktet von der Podcastbude.www.podcastbu.de - Full-Service-Podcast-Agentur - Konzeption, Produktion, Vermarktung, Distribution und Hosting.Du möchtest deinen Podcast auch kostenlos hosten und damit Geld verdienen?Dann schaue auf www.kostenlos-hosten.de und informiere dich.Dort erhältst du alle Informationen zu unseren kostenlosen Podcast-Hosting-Angeboten. kostenlos-hosten.de ist ein Produkt der Podcastbude.
Hør professor emeritus Bengt Karlsson fra Universitetet i Sørøst-Norge snakke sammen med Marte Yri om recovery-perspektivet mv.
Helga Trefaldighets församling - Missionsprovinsen i Kronoberg
Psalmer: 93, 697:6, 298, 616, 359
Vi ger oss ut på en radioresa genom Sverige och lyssnar på vad folk har för sig just nu! Lyssna på alla avsnitt i Sveriges Radios app. Ett nyfiket och underhållande aktualitetsprogram med lyssnaren i fokus.Dan vilar efter att han har räknat älgbajs i skogen, Bengt är mitt i löprundan och hunden Gunbritt har övat på att åka buss tillsammans med matte Lena. Det var även premiär för vår praktikant Saga i studion!I extra materialet berättar Christer och Hugo om deras stormen Dave-upplevelser.
Joa, da gibt's Redebedarf nach der Länderspielpause. Sportlich nicht überzeugt, weder als Mannschaft noch individuell, aber auch abseits des Platzes gibt es Frust über die Nationalmannschaft. Bengt berichtet von der Reise nach Basel und Stuttgart, und Felix versucht Gemüter zu beruhigen
Bengt Stiller hat einen Bildband über „Räder aus Stahl“ veröffentlicht. Was fasziniert ihn an Stahlrahmen und Rahmenbauern? (00:01:05) Begrüßung (00:02:54) Bengt Stiller – Räder aus Stahl (00:04:33) Faszination des Rahmenbaus (00:08:08) Die Seele des Rahmens (00:10:52) Räder und Freiheit (00:13:27) Verbindung zwischen Rahmenbauern, Sammlern und Veranstaltungen (00:15:00) Pantografieren und Punzieren (00:18:34) Alex Moulton (00:23:24) Veränderungen in der Stahlrahmenwelt (00:25:52) Gravelbike als Entdeckungserlebnis (00:27:35) Stahlrahmenbau heute (00:32:26) Fotografieren in Werkstätten (00:36:40) Berliner Knoten (00:39:23) Warum ein Bildband im Jahr 2026? (00:42:30) Stahlrahmenbau in zehn Jahren (00:49:27) Verabschiedung (00:51:01) Musik: Nick Drake – Day Is Done Hier geht’s zu unserer Playlist auf Spotify: https://open.spotify.com/playlist/0rFFrMDgoZX2PdHMwvaEmG?si=8w56NndiQQikVzEDcWtjNg Hier könnt ihr uns bei Steady unterstützen: https://steadyhq.com/de/antritt/about Hier entlang geht's zu den Links unserer Werbepartner: https://detektor.fm/werbepartner/antritt ➡️ Artikel zum Nachlesen: https://detektor.fm/kultur/antritt-bengt-stiller-ueber-sein-buch-raeder-aus-stahl
Bengt Stiller hat einen Bildband über „Räder aus Stahl“ veröffentlicht. Was fasziniert ihn an Stahlrahmen und Rahmenbauern? (00:01:05) Begrüßung (00:02:54) Bengt Stiller – Räder aus Stahl (00:04:33) Faszination des Rahmenbaus (00:08:08) Die Seele des Rahmens (00:10:52) Räder und Freiheit (00:13:27) Verbindung zwischen Rahmenbauern, Sammlern und Veranstaltungen (00:15:00) Pantografieren und Punzieren (00:18:34) Alex Moulton (00:23:24) Veränderungen in der Stahlrahmenwelt (00:25:52) Gravelbike als Entdeckungserlebnis (00:27:35) Stahlrahmenbau heute (00:32:26) Fotografieren in Werkstätten (00:36:40) Berliner Knoten (00:39:23) Warum ein Bildband im Jahr 2026? (00:42:30) Stahlrahmenbau in zehn Jahren (00:49:27) Verabschiedung (00:51:01) Musik: Nick Drake – Day Is Done Hier geht’s zu unserer Playlist auf Spotify: https://open.spotify.com/playlist/0rFFrMDgoZX2PdHMwvaEmG?si=8w56NndiQQikVzEDcWtjNg Hier könnt ihr uns bei Steady unterstützen: https://steadyhq.com/de/antritt/about Hier entlang geht's zu den Links unserer Werbepartner: https://detektor.fm/werbepartner/antritt ➡️ Artikel zum Nachlesen: https://detektor.fm/kultur/antritt-bengt-stiller-ueber-sein-buch-raeder-aus-stahl
Bengt Stiller hat einen Bildband über „Räder aus Stahl“ veröffentlicht. Was fasziniert ihn an Stahlrahmen und Rahmenbauern? (00:01:05) Begrüßung (00:02:54) Bengt Stiller – Räder aus Stahl (00:04:33) Faszination des Rahmenbaus (00:08:08) Die Seele des Rahmens (00:10:52) Räder und Freiheit (00:13:27) Verbindung zwischen Rahmenbauern, Sammlern und Veranstaltungen (00:15:00) Pantografieren und Punzieren (00:18:34) Alex Moulton (00:23:24) Veränderungen in der Stahlrahmenwelt (00:25:52) Gravelbike als Entdeckungserlebnis (00:27:35) Stahlrahmenbau heute (00:32:26) Fotografieren in Werkstätten (00:36:40) Berliner Knoten (00:39:23) Warum ein Bildband im Jahr 2026? (00:42:30) Stahlrahmenbau in zehn Jahren (00:49:27) Verabschiedung (00:51:01) Musik: Nick Drake – Day Is Done Hier geht’s zu unserer Playlist auf Spotify: https://open.spotify.com/playlist/0rFFrMDgoZX2PdHMwvaEmG?si=8w56NndiQQikVzEDcWtjNg Hier könnt ihr uns bei Steady unterstützen: https://steadyhq.com/de/antritt/about Hier entlang geht's zu den Links unserer Werbepartner: https://detektor.fm/werbepartner/antritt ➡️ Artikel zum Nachlesen: https://detektor.fm/kultur/antritt-bengt-stiller-ueber-sein-buch-raeder-aus-stahl
Adam klistrar fast Hart-etiketter och Jonta frågar sig vad som egentligen utmärker en framgångsrik huvudtränare.
Wien kring sekelskiftet 1900 var en stad där framtiden tycktes födas varje kväll på caféerna och varje morgon på boulevarderna. Här rörde sig unga Stefan Zweig och Adolf Hitler i samma kejsarstad, men i helt olika världar. Den ene drogs mot litteraturen, idéerna och Europas löfte. Den andre mot bitterheten, misslyckandena och de mörka strömningar som redan fanns i stadens politiska liv.I veckans avsnitt följer vi två unga österrikare i modernitetens kanske mest laddade huvudstad. Det blir ett avsnitt om klass, konst, utbildning, drömmar, nederlag och om Wien som platsen där både humanism och hat kunde växa ur samma jord. Vad var det egentligen de tog med sig därifrån?--Läslista:Bullock, Alan, Hitler: en studie i tyranni, [Ny utg.], Rabén Prisma, Stockholm, 1995Fest, Joachim, Hitler: en biografi. D. 1 Från ett liv utan mål till kampens tid, Fischer & Co, Stockholm, 2014Hamann, Brigitte, Hitler's Vienna: a dictator's apprenticeship, Oxford University Press, New York, 1999Liljegren, Bengt, Adolf Hitler, Historiska media, Lund, 2008Prochnik, George, Stefan Zweig vid världens ände: [biografi om en exil], Atlantis, Stockholm, 2015Shirer, William L., Det tredje rikets uppgång och fall: det nazistiska Tysklands historia, [Ny utg.], Forum, Stockholm, 1984Zweig, Stefan, Världen av i går: en europés minnen, [Ny utg.], Ersatz, Stockholm, 2011 Hosted on Acast. See acast.com/privacy for more information.
Upptäck de hemliga krafterna i yoghurt under vårdagjämningen och hör Magge och Elsa i en syrlig men humoristisk diskussion om Yoghurtsviks framtida evenemang. Missa inte intervjun med Bengt från yoghurtkollektivet som avslöjar yoghurtens andliga fördelar!
Mitä laki sannoo skuutterin ajamisesta mettässä? Anita Dubaissa ei pelkää. Entinen opettaja oppilhaista asiakhaina. Bengt löysi kaveria kuorolaulun kautta. Lyssna på alla avsnitt i Sveriges Radios app.
Boken heter "Att sätta gränser - ett villor för växande” och gäst är Bengt Grandelius. Det här är en nyklippt version av ett tidigare program (avsnitt 80).Gränssättning missförstås ofta med att man måste vara väldigt hård mot barn och att det kan riskera att inkräkta på barnets självbestämmande. Men att sätta gränser handlar snarare om att vägleda barn och att ge barnen både trygghet och självförtroende. Något som är oerhört viktigt för både barn och ungdomar.Frågor som diskuteras i programmet är bl.a: Varför är det viktigt att sätta gränser och hur gör man det på rätt sätt? Vad är grundproblemen med curlingföräldrarskap? Varför är det en viktig skillnad mellan auktoritet och auktoritär? Och vi pratar om varför barn och ungdomar ibland kan provocera fram reaktioner och gränser från vuxna. Läs mer om alla avsnitt och sök i arkivet på hemsidan: https://larafranlarda.com/Instagram: https://www.instagram.com/larafranlarda/Support till showen http://supporter.acast.com/larafranlarda. Hosted on Acast. See acast.com/privacy for more information.
Borussia Dortmund weiter auf Kurs! Ein 4:0 mit einer kuriosen Ecktaktik brignt Dortmund weiterhin auf Kurs Champions League. Und mehr? Das zeigt sich in zwei Wochen. Mit Urbig zwischen den Pfosten? Da hat Bengt auch eine ganz wilde Theorie zu. Die Welt redet über ManUnited, der Bundesliga Abstiegskampf wird heiß und eine Champions League Prognose gibt's gratis dazu!
Et drap skjer i den lille bygda, og plutselig dukker det opp etterforskere som raskt skyver lensmannen til side. Men sønnen hans, Bengt, har sett noe. Han har observert en person snike seg ut fra huset til den drepte Gunnhild Madsen. Uvirkelig er en fiksjonspodkast produsert av Svarttrost. Novellen er skrevet av forfatter Jan-Erik Vik. Innleser er Scott Maurstad, og Hans Kristen Hyrve har stått for musikk og lyddesign.See omnystudio.com/listener for privacy information.
I detta avsnitt möter MariaTherese och Uffe verksamhetschefen Bengt Lindholm för ett fördjupande samtal om de energier och den andliga aktivitet som upplevs på Österbybruks herrgård. Vi talar om herrgårdens professionella spökvandringar, där de historiska personligheter som en gång levt på platsen får komma till tals och berätta sina livsöden. Samtalet rör sig också kring själen, minnen i miljöer och hur historia och energi kan leva kvar över tid. Bengt delar dessutom med sig av flera fascinerande och tankeväckande berättelser från herrgården.Hemsida: www.mithera.se Österbybruks herrgård: www.osterbybruksherrgard.se
Aktuell keine leichte Zeit für Schiri-Deutschland. Hier Fehlentscheidung, da fragwürdiger VAR Einsatz, am nächsten Ort bleibt er dann wieder aus. Und was fehlt? Die Souveränität. Ob in Köln, München oder Augsburg, Woche für Woche wieder eine Plage. So auch heute, Felix und Bengt nutzen die Folge etwas als Punching Ball
Au weia. Marc-Andre ter Stegen verletzt sich bei Girona nach nur 2 Spielen und läuft Gefahr, die WM zu verpassen. Felix und Bengt schauen drauf, genau wie auf den Deadline Day, Bundesliga und Champions League!
Die Eintracht ist im Loch. Auch unter Interimstrainer Dennis Schmitt kann die Defensive nicht stabilisiert werden. Wie lange machen das die Fans noch mit? Augsburgs Social Media Admin hat den besten Arbeitstag des Jahres, Felix und Bengt mit bodenlosen Predictions in der Champions League und machen's für diese Woche einfach nochmal!
You can still win a signed jersey or VIP Tickets! Win the Podcast league in the Match Predictor by Gorenje ➡ https://bit.ly/TheSpin_Predictor That's it, the last unbeaten team has suffered a defeat today with Sweden losing by 8 goals against Iceland. What that means for the group, how Slovenia secured the victory over Hungary and how Croatia, too, got on 4 points in the group: Don't worry about it, Martin and Bengt got you covered!
You can still win a signed jersey or VIP Tickets! Win the Podcast league in the Match Predictor by Gorenje ➡ https://bit.ly/TheSpin_Predictor Sweden takes down the next opponent in main round group II. After Croatia it was Slovenia today who couldn't find a way against Darj, Bergendahl Carlsbogard. Switzerland takes an historic point to their table and with Croatia beating Iceland, the constellations in group II are shaking up themselves again. Martin and Bengt are here for you to look at it!
Podcast De Boekenparade tipt in elke aflevering een boek om voor te lezen aan je kinderen. Kinderboekenexpert Sara van Forum Groningen kiest een boek waar Jeroen van Studio Hoorzaam uit voorleest. Ze zoeken daarbij een haakje bij het Forum. In "Schildpadmeisje” volgen we Elvis, of Vis, die samen met haar moeder en zusje woont en in de eerste klas van de middelbare school zit. Haar vader is een tijdje geleden vertrokken naar Parijs. Hem spreekt en ziet ze eigenlijk nooit meer. Vis is een meisje dat veel interesse heeft in geschiedenis en oude spullen, waar ze mooie verhalen bij verzint. De spullen die ze vindt op straat mogen absoluut niet mee naar huis en bewaart ze dan ook in een afgelegen, geheime caravan. De caravan is een plek voor Elvis waar ze zich helemaal terug kan trekken en zichzelf kan zijn. De relatie tussen Vis en haar moeder is namelijk niet zo goed. De moeder van Vis heeft een drankprobleem en omdat Vis de oudste is, krijgt zij tijdens dronkenbuien vaak de wind van voren. Door hulp van achterbuurjongen Sid en schildpad Simon kruipt Elvis uit haar schulp en durft ze een moeilijk, maar moedige besluit te nemen. Meer boekentips?In al deze tips krijgen kinderen in moeilijke situaties ondersteuning van iets externs: een kiezel, een mysterieuze deur die je naar een andere tijd brengt of een rouwbot. Net als bij Elvis, die haar schildpad Simon vindt en daar steun uithaalt, zijn in deze boeken kinderen wellicht ook wat eenzaam, verdrietig of onzeker en is er iets dat hen hierbij helpt. Tip 1 - Duizend stukjes overal – Mariska Overman Mijs zit in de eerste klas van de middelbare school. Een jaar geleden is haar broer Joes overleden. Dit heeft ervoor gezorgd dat Mijs vaak heel boos is. Die boosheid maakt haar ook eenzaam, want veel mensen op school gaan Mijs nu uit de weg. Mijs' ouders sturen haar naar een therapeut. Die komt met het idee van een rouwbot: een robot met een door AI-gegenereerde stem van haar broer. De beste vriend van haar broer Bowie vindt dit onzin. Hij neemt Mijs mee op sleeptouw om haar de plekken te laten zien waar Joes en Bowie graag kwamen. Op die manier kan je volgens Bowie ook het gevoel hebben dat je nog “bij” iemand bent. Er ontstaat een vriendschap tussen Mijs en Bowie, eentje waarin Mijs achter het geheim van Bowie komt wat hen nog dichterbij elkaar brengt. Tip 2 - 55 meter onder water – Conny Palmkvist In 55 meter onder water probeert Bengt Gustav de fouten in het verleden van zijn vader recht te zetten, fouten die volgens zijn vader de oorzaak zijn van drankprobleem. Bengt vindt in een leegstaande flat een mysterieuze deur waar 1984 op staat. Zijn nieuwsgierigheid neemt de overhand is het begin van een avontuur dat alles wegheeft van tijdreizen. Hoe harder Bengt zijn best doet om de fouten van zijn vader te verhelpen en voorkomen, hoe meer alles misgaat. Dit boek is een wijze les naar leren hulp te vragen voordat het te laat is. Tip 3 - Kiezel – Bouwien Jansen In Kiezel van Bouwien Jansen volgen we Kat. Kat verhuist met haar ouders naar het Houten Nest. Het huis van haar overleden oom. Een plek waar Kat zich helemaal thuis voelt, maar haar ouders niks mee hebben. Kat en haar ouders zijn ongelooflijk verschillend. Zij ziet de spullen van haar oom als prachtige objecten met mooie verhalen, haar ouders zien er alleen geld is want het zijn kostbare spullen die ze goed kunnen verkopen. Kat voelt zich vaak best heel eenzaam, maar heeft gelukkig haar hond Beest. Op een dag vindt ze bij het Houten Nest een kiezel. Een normaal, grijs steentje dat toch haar aandacht trekt. Het steentje lijkt soms wat te knetteren en wordt gloeiend heet als ze ‘m vasthoudt. De kiezel is de start van een avontuur met nieuwe vriendschappen, familiegeheimen en een gat in jezelf dat gevuld kan worden door jezelf beter te leren kennen en nieuwe dingen aan te durven gaan. Leen deze boeken bij één van onze vestigingen. Nog geen lid van de Forumbibliotheek? Tot 18 jaar ben je gratis lid!
Happy New Year! Die Bundesliga meldet sich mit einem 8:1, Skandalinterviews und den ersten Trainerdebatten des Jahres zurück. Harry Kane hat von Bengt das Fußballspielen gelernt, Dortmund ist im Stürmerloch und dazu geht's direkt mit einer englischen Woche ins Rennen. Let's go!
Den 1 januari är en av de dagar på året då flest människor tar sitt liv. Idag gästas vi av författaren och poeten Niklas Rådström och har ett djupt, stillsamt och smärtsamt viktigt samtal om självmord, sorg och det som blir kvar hos dem som överlever. Niklas förlorade sin barndomsvän Bengt i suicid när Bengt var 23 år gammal. I boken En handfull regn skriver han fram deras vänskap, de existentiella samtalen året innan Bengts död och tystnaden, ilskan och skulden som följde efteråt.Hur lever man vidare när någon man älskar tar sitt liv? Vad gör det med minnet av de sista samtalen, de sista mötena? Går det att förstå eller är det just oförståelsen som måste få finnas? Niklas berättar om den långsamma vägen fram till att våga skriva om självmordet, om vreden som kan riktas både utåt och inåt, och om den desperata viljan att hålla kvar livet i dem vi älskar.Samtalet rör också dödens närvaro på ett annat plan: Niklas egen erfarenhet av att bli svårt sjuk i cancer, att förlora sitt ”medborgarskap bland de friska” och vad det gör med synen på liv och mening. Om du eller någon du känner behöver hjälp: Självmordslinjen: 90 101 Läs mer på mind.se Programledare: Ida Höckerstrand & Sofie Hallberg Klippning: Sofie Hallberg Instagram: @angestpodden @idahockerstrand @sofiehallbergFacebook: ÅngestpoddenTikTok: @therealangestpoddenHar du förslag på ämnen, ett dilemma eller gäster du skulle vilja höra i Ångestpodden?Mejla oss gärna: angestpodden@ingetfilter.seBehöver du prata med någon?https://hjalplinjen.semind.se spes.se suicidezero.se teamtilia.sebris.se Hosted on Acast. See acast.com/privacy for more information.
I august i år dro jeg på besøk til Bengt Are Barstad. Det var 11 dager siden han hadde markert 2000 dager sammenehengende ute på tur og vi hadde avtalt at vi skulle lage to episoder mens jeg var på besøk. Den første kvelden utviklet seg til en liten fortsettelse av festen som hadde markert 2000-dagers-feiringen. Jublianten bød på rødvin og sigar, mens jeg hadde med øl. Alt i alt hadde vi det riktig så fortreffelig. En time eller to senere fikk Bengt Are plutselig en idé: Nå og akkurat nå, skulle vi spille inn podkast. Det hadde jeg på ingen måte planlagt, men etter noen sekunders betenkningstid kom jeg frem til at vi måtte lage en juleepisode.Og som sagt, så gjort.Denne episoden er forbeholdt Langturkompiser og Ekspedisjonsmakkere i det digitale turlaget på Patreon, og kan sees som videopodkast.Besøk min kommersielle samarbeidspartner Barents Outdoor AS Hosted on Acast. See acast.com/privacy for more information.
Handlingen utspelar sig i Östergötland, dels på Bjelbo, dels på och omkring Ulfåsa. Året är 1261. Lyssna på alla avsnitt i Sveriges Radios app. Knut Algotson har förlorat båda sina söner och sina rikedomar i kriget mot Birger Jarl. När så en modig väpnare ber om hans dotter Sigrids hand får han ett blankt nej.Men väpnaren återvänder samma dag och avslöjar sin rätta identitet; han heter Bengt och är en rik och mäktig lagman och bror till självaste Birger Jarl! Knut godtar då genast sin dotters friare som kan skänka en riklig bröllopsgåva och dessutom hämnd på Birger Jarl.Ska nu de lyckliga tu överleva det till synes oundvikliga kriget mellan den oförsonlige Knut och hans motståndare Birger Jarl?Pjäsen har en särskild plats i svensk teaterhistoria då den spelades på otaliga svenska provinsteatrar. Exempelvis stod den stod på Hjalmar Selanders teatersällskaps repertoar i trettio år! Bröllopet på Ulfåsa av Frans HedbergEn pjäs i fyra akter med musik av August Söderman.I rollerna: Birger Jarl till Bjelbo – Sven Bergvall, Jarlens gemål Mechthilde – Signe Enwall, Jarlens bror Bengt Lagman – Arnold Sjöstrand, Jarlens hövitsman Härvet Boson – Kolbjörn Knudsen, Riddaren Knut Algotson – Axel Högel, Riddarens husfru Ingrid – Linnéa Hillberg, deras dotter Sigrid – Anna Lindahl, Priorn i Wreta Botvid – Georg Blickingberg, Kol Tynnesson – Nils Hultgren, Inga – Brita BruniusDessutom medverkade, som riddare, tärnor, småsvenner, knektar och allmoge: Allan Adelby, Helge Andersson, Meeri Anner, Gösta Eriksson, Manne Grünberger och Algot Persson.Ingas visa: Rut MobergSång i Ingas visa ”Och ungmön hon gick sig i fagraste lund”: Birgitta AndréBröllopsmarsch: Radioorkestern i Stockholm. Dirigent: Ivar Hellman.Bröllopslek: Radioorkestern i Stockholm med Radiokören. Dirigent: Ivar Hellman.Regi: Carl-Otto SandgrenEn inspelning från 1940.
Sedan 2017 har podden Historier från Hälsingland berättat om de blå bergens landskap i både vidskepelse och verklighet. Det kommer vi fortsätta göra, men fram till jul gör vi ett undantag. Under årens lopp har vi fått lyssnare över hela Sverige, vilket vi är oerhört stolta och glada över. Er alla vill vi uppmärksamma. I årets julkalender I väntan på julbocken gör vi en resa över Sverige och besöker samtliga landskap för att ta del av dess sägenflora. Den nittonde luckan i vår kalender innehåller Västerbotten. Dagens sägner är hämtade ur böckerna Svenska folksägner av Herman Hofberg, Svenska sägner av Ebbe Schön, Svenska folksägner av Bengt af Klintberg samt Institutet för språk och folkminnens arkiv i Uppsala. För information om mejerskan Magdalena Charlotta Markström har vi tagit hjälp av Wikipedia. Vill du stödja podden? SWISH 1235672431 BOKA IN HISTORIER FRÅN HÄLSINGLAND Vill du, din förening eller företag boka Historier från Hälsingland för en berättarkväll? Mejla oss på kontakta@historierfranhalsingland.se eller ring 0739937451 alt 0702344117 Mer information https://www.historierfranhalsingland.se/anlita-oss/ Följ oss på Facebook och Instagram. HJÄLP OSS! Historier från Hälsingland planerar att under januari 2026 besöka äldreboenden i Hälsingland med omnejd och arrangera berättarstunder. För att dessa evenemang inte ska kosta något för de äldre söker vi hjälp av er lyssnare och läsare med sponsring. Stora som små bidrag är mycket välkommet. Eftersom dessa evenemang inte går av stapeln förrän i januari har vi beslutat oss för att hålla igång insamlingen till och med den 31 december, i hopp om att vi tillsammans kan ordna fler dagar och därmed också fler äldreboendebesök, berättarstunder med fika som för de boende ska vara helt gratis men förhoppningsvis skapa glädje och väcka minnen. För att ni alla som skänkt en gåva ska veta var vi varit kommer vi att redovisa varenda besök på vår Facebook samt hemsida. Är du företagare eller privatperson och vill veta mer? Ring oss på 0739937451 (Robert) eller mejla kontakta@historierfranhalsingland.se Hur bidrar man? SWISH 1235672431 Märk meddelandet med ”gåva äldre”. Vill du vara anonym skriv gärna det. BANK-GIRO 5111–9261 För mer information om projektet besök https://www.historierfranhalsingland.se/berattarstunder-pa-aldreboenden/
Sedan 2017 har podden Historier från Hälsingland berättat om de blå bergens landskap i både vidskepelse och verklighet. Det kommer vi fortsätta göra, men fram till jul gör vi ett undantag. Under årens lopp har vi fått lyssnare över hela Sverige, vilket vi är oerhört stolta och glada över. Er alla vill vi uppmärksamma. I årets julkalender I väntan på julbocken gör vi en resa över Sverige och besöker samtliga landskap för att ta del av dess sägenflora. Den sjuttonde luckan i vår kalender innehåller Öland. Dagens sägner är hämtade ur böckerna Svenska folksägner av Herman Hofberg, Svenska folksägner av Bengt af Klintberg och Folklivssägner från Öland av fröken J Wilner samt Institutet för språk och folkminnens arkiv i Uppsala. Vill du stödja podden? SWISH 1235672431 BOKA IN HISTORIER FRÅN HÄLSINGLAND Vill du, din förening eller företag boka Historier från Hälsingland för en berättarkväll? Mejla oss på kontakta@historierfranhalsingland.se eller ring 0739937451 alt 0702344117 Mer information https://www.historierfranhalsingland.se/anlita-oss/ Följ oss på Facebook och Instagram. HJÄLP OSS! Historier från Hälsingland planerar att under januari 2026 besöka äldreboenden i Hälsingland med omnejd och arrangera berättarstunder. För att dessa evenemang inte ska kosta något för de äldre söker vi hjälp av er lyssnare och läsare med sponsring. Stora som små bidrag är mycket välkommet. Eftersom dessa evenemang inte går av stapeln förrän i januari har vi beslutat oss för att hålla igång insamlingen till och med den 31 december, i hopp om att vi tillsammans kan ordna fler dagar och därmed också fler äldreboendebesök, berättarstunder med fika som för de boende ska vara helt gratis men förhoppningsvis skapa glädje och väcka minnen. För att ni alla som skänkt en gåva ska veta var vi varit kommer vi att redovisa varenda besök på vår Facebook samt hemsida. Är du företagare eller privatperson och vill veta mer? Ring oss på 0739937451 (Robert) eller mejla kontakta@historierfranhalsingland.se Hur bidrar man? SWISH 1235672431 Märk meddelandet med ”gåva äldre”. Vill du vara anonym skriv gärna det. BANK-GIRO 5111–9261 För mer information om projektet besök https://www.historierfranhalsingland.se/berattarstunder-pa-aldreboenden/
Da sind wir wieder! Unsere Podcast-Hosting Plattform hat Faxen gemacht, weshalb die letzten beiden Folgen ins Wasser gefallen sind. Vielleicht reichen wir sie noch nach! Nach der irren WM Verlosung waren nun also noch die Ticketpreise dran. Und das kannst du keinem erzählen, 6.000€ für alle Spiele der Nationalmannschaft, die haben Bengt etwas in Rage versetzt. Aber nebenher wurde auch noch n bisschen Fußball gespielt, hat nur noch kaum wer mitbekommen aufgrund dieser Farce!
Hva var det første øyeblikket som fikk ham til å forstå at friluftsliv ikke bare var en hobby, men selve livet? Hva er den mest utfordrende perioden han har hatt ute i felt, og hva lærte han om seg selv der? Bengt-Are Barstad deler erfaringene sine fra over 2100 døgn ute i naturen. Vi snakker om mental styrke, ro under press, og hvordan man bygger trygghet og kunnskap gjennom ekte friluftsliv. Han forteller om alvorlige vær- og livstruende situasjoner, forskjellen mellom romantisert villmark og virkeligheten, og hvorfor naturen gir en egen type mening og frihet. Du får også praktiske råd for å starte dine egne turer, hvordan leve med hunder i telt, og tips til bål, utstyr og mindset. Ønsker du å støtte Bengt-Are? VIPPS til 647528 Bare friluftsliv AS
Det behövs nya lantbrukare. Över en tredjedel av svenska bönder är 65+. Generationsskiften är komplicerade och kostsamma. Vi möter två unga bönder. Lyssna på alla avsnitt i Sveriges Radio Play. Idag är över en tredjedel av alla som driver enskilda jordbruksföretag i pensionsåldern. Och mer än hälften är över 60 år.En del lantbruk kommer läggas ner. Andra kommer gå igenom ett generationsskifte.Men att få in en ny generation handlar om mycket pengar. Dagens jordbruk är stora företag.Historiskt har generationsskiftet skett inom familjen. Ett barn tar över efter föräldrarna. Finns det syskon så ska de köpas ut till marknadsmässigt pris.Om någon utomstående ska ta över så handlar det helt enkelt om att köpa verksamheten. Gå till banken och fråga om det går att låna alla de där miljonerna.I programmet träffar vi Emilia Astrenius, som under en tioårsperiod tillsammans med en kollega köper in sig och tar över Lövåsa gård i Götene från lantbrukaren och arbetsgivaren Dag Arvidsson.Och Ivar Nilsson, som tog över Tånga gård utanför Falkenberg från sina föräldrar Bengt och Katarina för fem år sen.
Today, I'm joined by Bengt and Daniel Wiberg of Sting Free AB in Sweden to talk about their revolutionary ProTex technology, applicable for snus and nicotine pouches! Check them out at StingFreeSnus.com. Snubie Links:Online: www.Snubie.comVideo: www.SnubieTV.comPodcast: www.PouchCast.com
durée : 00:20:51 - Les Nuits de France Culture - par : Albane Penaranda, Mathias Le Gargasson, Antoine Dhulster - Instinctives, colorées et abstraites, les peintures de l'artiste suédois Bengt Lindström s'inspirent des mythologies nordiques. En 1988 dans l'émission "Peintres et ateliers", l'ethnologue Michel Perrin analyse le caractère chamanique de cette œuvre picturale où communiquent plusieurs mondes. - réalisation : Massimo Bellini, Vincent Abouchar - invités : Michel Perrin Ethnologue et anthropologue français
Sveriges – eventuellt – förste terrorist hette Anton Nilsson. Han sprängde Amalthea. Ett skepp där brittiska strejkbrytare huserade. Det första decenniet på 1900-talet utmärktes av stora oroligheter på den svenska arbetsmarknaden. Vilket kulminerade i storstrejken 1909.I det här avsnittet gör vi följande: tar pulsen på samhället anno 1906, forskar i Herman Lindqvists släktled och delar ut skinnjackor till särskilt duktiga spaningsflygare i röda armén. Det vill ni inte missa.—Kom ihåg att ni kan bli prenumeranter och lyssna fritt från reklam på historiepodden.supercast.com—Läslista:Ohlsson, Per T., Svensk politik, Historiska media, Lund, 2014Wigforss, Ernst, Minnen. 1 Före 1914, Tiden, Stockholm, 1950Platen, Gustaf von, Bakom den gyllne fasaden: Gustaf V och Victoria : ett äktenskap och en epok, Bonnier, Stockholm, 2002Ankarloo, Bengt (red.), 1900-talet: vår tids historia i ord och bild, Bokfrämjandet, Helsingborg, 1976Lindqvist, Herman, Historien om Sverige Drömmar och verklighet, 2., korr. uppl., Norstedt, Stockholm, 2000 Hosted on Acast. See acast.com/privacy for more information.
Bengt Gustafsson är professor i astrofysik i Uppsala och har spanat på rymden sedan han byggde sitt första teleskop vid 11 års ålder. Lyssna på alla avsnitt i Sveriges Radio Play. Programmet är en repris från mars 2025. Som ung funderade Bengt Gustafsson på att bli musiker eller arkitekt, men barndomsintresset astronomi tog över. Vid 81 års ålder är han fortfarande verksam som stjärnfysiker - i en tid som inte alls liknar den där han började, med fotoplåtar och med räknemaskiner på små sidendynor.Också vår bild av universum har ändrats genom åren – och många nya sätt att studera rymden har tillkommit.Programledare: Camilla WidebeckProducent: Lars Broström
VEM: Johan Rabaeus.YRKE: Skådespelare.AVSNITT: 685.OM: Att försonas med ''papi Bengt'' efter 15 år, grejen med Gertrude Stein på Dramaten, sin borliga diplomatuppväxt, varför han och Kristoffer skulle kunna bli osams, huruvida konsten var bättre förr och givetvis en hel del om att fortsätta med sitt pensum efter 77, inshallah.Värvet sommar börjar 4/7: 'Livet – en handbok' inläst av självaste Johan Rabaeus. Men blir du otålig går den att köpa till exempel på Adlibris, Akademibokhandeln eller Bokus.SAMTALSLEDARE: Kristoffer TriumfPRODUCENT: Ninni WestinKONTAKT: varvet@triumf.se och InstagramHosted on Acast. See acast.com/privacy for more information. Hosted on Acast. See acast.com/privacy for more information.
VEM: Johan Rabaeus.YRKE: Skådespelare.AVSNITT: 685.OM: Att försonas med ''papi Bengt'' efter 15 år, grejen med Gertrude Stein på Dramaten, sin borliga diplomatuppväxt, varför han och Kristoffer skulle kunna bli osams, ankaret Camilla Thulin, tjusningen med segling, förhållandet till publiken, huruvida konsten var bättre förr och givetvis en hel del om att fortsätta med sitt pensum efter 77, inshallah.Värvet sommar börjar 4/7: 'Livet – en handbok' inläst av självaste Johan Rabaeus. Men blir du otålig går den att köpa till exempel på Adlibris, Akademibokhandeln eller Bokus.SAMTALSLEDARE: Kristoffer TriumfPRODUCENT: Ninni WestinKONTAKT: varvet@triumf.se och InstagramHosted on Acast. See acast.com/privacy for more information. Hosted on Acast. See acast.com/privacy for more information.
What does it take to invest in circularity in construction - an industry defined by waste, emissions, and long scaling cycles? In this episode, Bengt Steinbrecher of Holcim MAQER Ventures shares how one of the largest building materials companies works with startups to decarbonise the sector. From reusing 10 million tons of demolition material to testing carbon-storing concrete across Europe, Holcim blends strategic relevance with clear circular KPIs. The episode explores how corporate venture capital enables circular startups to scale in the construction industry - through market access, operational integration, and long-term collaboration. This episode is part of VC for Circularity - the Venture Capital Perspective on Circular Economy Startups.
Bengt Gustafsson är professor i astrofysik i Uppsala och har spanat på rymden sedan han byggde sitt första teleskop vid 11 års ålder. Lyssna på alla avsnitt i Sveriges Radio Play. Som ung funderade Bengt Gustafsson på att bli musiker eller arkitekt, men barndomsintresset astronomi tog över. Vid 81 års ålder är han fortfarande verksam som stjärnfysiker - i en tid som inte alls liknar den där han började, med fotoplåtar och med räknemaskiner på små sidendynor. Också vår bild av universum har ändrats genom åren – och många nya sätt att studera rymden har tillkommit.Programledare: Camilla WidebeckProducent: Lars Broström
Det halkas på den, det åks på den, vissa äter den, andra skapar av den och de senaste veckorna har det varit mycket av den på flera håll i landet vi pratar om isen! Lyssna på alla avsnitt i Sveriges Radio Play. Ett nyfiket och underhållande aktualitetsprogram med lyssnaren i fokus.Hur gör man klaris egentligen? Robert ringde upp och förklarade! Vi fick även ett snack med Anna-Sofia som jobbar som isskulptör och Bengt berättade om när han ordnade en isbana i ett växthus.I extramaterialet diskuteras kastrull i kylskåp och att det bästa med att åka skridsko är att få ta av sina skridskor.
Det är trettondagen och för många sista lediga dagen innan de är på väg tillbaka till jobbet. Men det finns såklart andra ställen man är på väg till. Därför undrar vi: vart är du på väg just nu? Lyssna på alla avsnitt i Sveriges Radio Play. Ett nyfiket och underhållande aktualitetsprogram med lyssnaren i fokus.Ni är på väg överallt! Vi hör Bengt som ska bestiga Kilimanjaro, Margareta som är på väg mot sin 96:e födelsedag och Ulrik som ringer upp oss direkt från Dubai innan han ska in i en flygsimulator.
Vi önskar er alla ett gott nytt år och vill tacka för så mycket för det gångna året. Kommentera gärna och berätta vad ni tycker har varit bäst i år. Planeringen inför 2025 pågår för fulla muggar och alla fingervisningar om vad som uppskattas mest är värdefulla för oss. Men först ska avsluta poddåret 2024 men en sjujävla berättelse!Norden har sin egen Herakles. Han kallades Beowulf och enligt det kväde som bär hans namn dräpte han tre monster under sin långa och ärofyllda gärning. Sagan är en av JRR Tolkiens största inspirationer för berättelserna från Midgård.Men Beowulfkvädet är också föremål för en intensiv och ganska illasinnad debatt om huruvida den går att använda som historisk källa. Sagan kommer mycket sannolikt från nordiskt 500-tal innan den skrevs ned i England någon gång på 700-talet. Sagan äger rum på Danmark och innehåller referenser till sveakungar och medeltida krig. Är det sanning eller myt? Varför bråkas det så mycket?——Läslista:1.Eriksson, Bo, Tusen år av fantasy: resan till Mordor, Historiska media, Lund, 20202.Eriksson, Kristina Ekero, Vikingatidens vagga: i vendeltidens värld, Första utgåvan, Natur & Kultur, [Stockholm], 20213.Gräslund, Bo, Beowulfkvädet: den nordiska bakgrunden, [Kungl. Gustav Adolfs Akademien för svensk folkkultur], Uppsala, 20184.Lovén, Christian, ‘Beowulf och Gotland: replik till Bo Gräslund', Fornvännen (Print)., 2019(114):4, s. 249-252, 20195.Lönnroth, Lars, Det germanska spåret: en västerländsk litteraturtradition från Tacitus till Tolkien, Första utgåvan, Natur & kultur, Stockholm, 20176.”Drakdödaren som var kung på Gotland” af Klintberg, Bengt i SvD 2018-10-137.”Lätt att rasera teori om Gotländsk Beowulf” Harrison, Dick på SvD.se Lyssna på våra avsnitt fritt från reklam: https://plus.acast.com/s/historiepodden. Hosted on Acast. See acast.com/privacy for more information.