Podcasts about docker

Occupation of loading and unloading ships

  • 1,429PODCASTS
  • 7,585EPISODES
  • 50mAVG DURATION
  • 1DAILY NEW EPISODE
  • Mar 20, 2026LATEST
docker

POPULARITY

20192020202120222023202420252026

Categories



Best podcasts about docker

Show all podcasts related to docker

Latest podcast episodes about docker

DevOps and Docker Talk
Backup S3, Google Drive, iCloud, Notion with Plakar

DevOps and Docker Talk

Play Episode Listen Later Mar 20, 2026 68:31


Bret is joined by the founders of Plakar - Julien Mangeard and Gilles Chehade - to nerd out over backup engineering. The kind where you're building your own file formats and cryptographic layers, not just wiring up cron jobs. We get into how Plakar deduplicates and encrypts at the source so your cloud provider never sees your keys. Also, their snapshot model has no chain dependencies, which means you can delete any backup without breaking the others. We had a fun hour of backup horror stories, ransomware pragmatism, where I'm lobbying hard for a Docker volume integration.Check out the video podcast version here: https://youtu.be/OPRK5osKQHI

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
NanoClaw Creator Lands Docker Deal After Six Weeks

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

Play Episode Listen Later Mar 13, 2026 10:36


In this episode, we explore the incredible rise of NanoClaw, an open-source AI agent tool created by Gavriel Cohen in just 48 hours. We cover how it went viral, attracted major attention from AI researchers, and eventually led to a significant partnership with Docker.Chapters00:00 NanoClaw's Meteoric Rise01:44 From Side Project to Viral Sensation03:39 The Problem with OpenClaw and NanoClaw's Solution08:27 NanoCo's Future Business Model LinksGet the top 40+ AI Models for $8.99 at AI Box: ⁠⁠https://aibox.aiAI Chat YouTube Channel: https://www.youtube.com/@JaedenSchaferJoin my AI Hustle Community: https://www.skool.com/aihustle

Atareao con Linux
ATA 778 ¡Adiós Docker! Cómo configurar Traefik con Podman (Rootless y Seguro)

Atareao con Linux

Play Episode Listen Later Mar 12, 2026 21:00


Bienvenidos a un episodio clave en la serie de Podman. Soy Lorenzo y hoy configuramos nuestro proxy inverso de referencia utilizando Podman y Quadlets. Si alguna vez te has preguntado si puedes dejar atrás Docker sin perder la potencia de Traefik, este podcast te dará todas las respuestas.Lo que aprenderás en este episodio:Seguridad Rootless: Cómo ejecutar Traefik sin ser root y por qué es la mejor decisión para tu servidor.Gestión de Puertos: El truco para usar los puertos 80 y 443 con un usuario común de forma persistente.Persistencia con Systemd: Configuramos el sistema para que tus servicios sigan vivos aunque cierres tu sesión.Quadlets y IaC: Organización de volúmenes, redes y contenedores mediante archivos de configuración limpios.Rendimiento Avanzado: Implementación de HTTP/3, optimización de cifrados (como ChaCha20) y compresión de tráfico.Ecosistema de Plugins: Integración de OIDC con Pocket ID para una autenticación centralizada y segura.Exploramos cómo el uso de variables como %H y %T simplifica el despliegue en diferentes entornos y cómo la configuración dinámica de Traefik nos permite añadir servicios "al vuelo" sin interrupciones. También profundizo en las medidas de seguridad extremas, como eliminar todas las capacidades del kernel excepto las necesarias para el bind de puertos y forzar que el sistema de archivos del contenedor sea de solo lectura.Marcadores de tiempo:00:00:00 - Introducción y objetivos00:02:18 - El reto de los puertos 80 y 44300:04:14 - Persistencia de procesos de usuario00:05:13 - Socket de Podman vs Docker00:06:43 - Quadlets: La magia de la infraestructura00:09:34 - Seguridad y privilegios mínimos00:12:12 - Configuración estática y dinámica00:14:39 - Autenticación avanzada con OIDC00:18:43 - Podman como el futuro del self-hostingNo te pierdas los detalles técnicos disponibles en las notas del episodio y únete a nuestra comunidad en Telegram para debatir sobre el fascinante mundo del Open Source.Más información y enlaces en las notas del episodio

Ardan Labs Podcast
Creativity, AI, and Supreme Robot with Victor Varnado

Ardan Labs Podcast

Play Episode Listen Later Mar 11, 2026 80:12


 In this episode of the Ardan Labs Podcast, Ale Kennedy talks with Victor Varnado, entrepreneur, comedian, and founder of Supreme Robot, about the intersection of creativity and technology. From launching the Worldwide Tic Tac Toe Championship to building AI-powered tools like Magic Bookifier, Victor shares his journey through improv comedy, television, podcasting, and app development.00:00 Introduction and Background02:58 Worldwide Tic Tac Toe Championship09:02 From Improv to Entertainment15:02 Founding Supreme Robot21:00 Entrepreneurship and Creative Risk31:59 Using AI for Writing41:52 The Creation of Magic Bookifier49:01 Acting, Filmmaking, and Reality TV01:06:52 Pandemic Challenges and Reinvention01:15:22 Podcasting and New Ventures01:17:23 AI, Apps, and Future ProjectsConnect with Victor: Supreme Robot: https://supremerobot.comLinkedIn: https://www.linkedin.com/in/victorvarnado/Mentioned in this Episode:Magic Bookifier: https://magicbookifier.aiWant more from Ardan Labs? You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!Online Courses : https://ardanlabs.com/education/ Live Events : https://www.ardanlabs.com/live-training-events/ Blog : https://www.ardanlabs.com/blog Github : https://github.com/ardanlabs

Admin Admin Podcast
Admin Admin Podcast #105 – The one without the musician

Admin Admin Podcast

Play Episode Listen Later Mar 9, 2026 28:54


In this episode, Al, Jon, and Jerry catch up on some major life changes, dive deep into the world of Model Context Protocols (MCP), and discuss the practicalities of moving into IT consultancy. Plus, we explore why Docker might be your best friend for CLI tools and how to keep your AWS environment secure with […]

The DevOps Kitchen Talks's Podcast
DKT91: Мок-интервью DevOps - Архитектура AWS, Terraform и Live Debug K8s

The DevOps Kitchen Talks's Podcast

Play Episode Listen Later Mar 7, 2026 108:02


Проверяем знания кандидата на позицию Senior DevOps инженера в прямом эфире. В этом выпуске: архитектурные паттерны в AWS, вечный спор Terraform против CloudFormation, глубокое погружение в Kubernetes (Karpenter, скейлинг) и Live-траблшутинг сломанного Helm-чарта. О ЧЁМ ВЫПУСК: • Архитектура и облака: Как выбрать между EKS и ECS/Fargate и настроить безопасное хранение бэкапов в S3.  • IaC войны: Честное сравнение Terraform и CloudFormation — где заканчивается удобство и начинается боль.  • Kubernetes под капотом: Разбираем Control Plane, работу контроллеров и нюансы обновления on-prem кластеров.  • Live Debug: Реальная задача по починке упавшего пода (CrashLoopBackOff) — работа с пробами, портами и Helm.  • CI/CD стратегии: Строим идеальный пайплайн с GitHub Actions и ArgoCD. ГОСТЬ: Максим — DevOps-инженер (5 лет опыта DevOps, 10 лет SysAdmin). Стек: AWS, Terraform, Kubernetes, Ansible, Monitoring. ССЫЛКИ

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

All speakers are announced at AIE EU, schedule coming soon. Join us there or in Miami with the renowned organizers of React Miami! Singapore CFP also open!We've called this out a few times over in AINews, but the overwhelming consensus in the Valley is that “the IDE is Dead”. In November it was just a gut feeling, but now we actually have data: even at the canonical “VSCode Fork” company, people are officially using more agents than tab autocomplete (the first wave of AI coding):Cursor has launched cloud agents for a few months now, and this specific launch is around Computer Use, which has come a long way since we first talked with Anthropic about it in 2024, and which Jonas productized as Autotab:We also take the opportunity to do a live demo, talk about slash commands and subagents, and the future of continual learning and personalized coding models, something that Sam previously worked on at New Computer. (The fact that both of these folks are top tier CEOs of their own startups that have now joined the insane talent density gathering at Cursor should also not be overlooked).Full Episode on YouTube!please like and subscribe!Timestamps00:00 Agentic Code Experiments00:53 Why Cloud Agents Matter02:08 Testing First Pillar03:36 Video Reviews Second Pillar04:29 Remote Control Third Pillar06:17 Meta Demos and Bug Repro13:36 Slash Commands and MCPs18:19 From Tab to Team Workflow31:41 Minimal Web UI Philosophy32:40 Why No File Editor34:38 Full Stack Cursor Debate36:34 Model Choice and Auto Routing38:34 Parallel Agents and Best Of N41:41 Subagents and Context Management44:48 Grind Mode and Throughput Future01:00:24 Cloud Agent Onboarding and MemoryTranscriptEP 77 - CURSOR - Audio version[00:00:00]Agentic Code ExperimentsSamantha: This is another experiment that we ran last year and didn't decide to ship at that time, but may come back to LM Judge, but one that was also agentic and could write code. So it wasn't just picking but also taking the learnings from two models or and models that it was looking at and writing a new diff.And what we found was that there were strengths to using models from different model providers as the base level of this process. Basically you could get almost like a synergistic output that was better than having a very unified like bottom model tier.Jonas: We think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster and we'll be making the pipe much wider and so paralyzing more, whether that's swarms of agents or parallel agents, both of those are things that contribute to getting much more done in the same amount of time.Why Cloud Agents Matterswyx: This week, one of the biggest launches that Cursor's ever done is cloud agents. I think you, you had [00:01:00] cloud agents before, but this was like, you give cursor a computer, right? Yeah. So it's just basically they bought auto tab and then they repackaged it. Is that what's going on, or,Jonas: that's a big part of it.Yeah. Cloud agents already ran in their own computers, but they were sort of site reading code. Yeah. And those computers were not, they were like blank VMs typically that were not set up for the Devrel X for whatever repo the agents working on. One of the things that we talk about is if you put yourself in the model shoes and you were seeing tokens stream by and all you could do was cite read code and spit out tokens and hope that you had done the right thing,swyx: no chanceJonas: I'd be so bad.Like you obviously you need to run the code. And so that I think also is probably not that contrarian of a take, but no one has done that yet. And so giving the model the tools to onboard itself and then use full computer use end-to-end pixels in coordinates out and have the cloud computer with different apps in it is the big unlock that we've seen internally in terms of use usage of this going from, oh, we use it for little copy changes [00:02:00] to no.We're really like driving new features with this kind of new type of entech workflow. Alright, let's see it. Cool.Live Demo TourJonas: So this is what it looks like in cursor.com/agents. So this is one I kicked off a while ago. So on the left hand side is the chat. Very classic sort of agentic thing. The big new thing here is that the agent will test its changes.So you can see here it worked for half an hour. That is because it not only took time to write the tokens of code, it also took time to test them end to end. So it started Devrel servers iterate when needed. And so that's one part of it is like model works for longer and doesn't come back with a, I tried some things pr, but a I tested at pr that's ready for your review.One of the other intuition pumps we use there is if a human gave you a PR asked you to review it and you hadn't, they hadn't tested it, you'd also be annoyed because you'd be like, only ask me for a review once it's actually ready. So that's what we've done withTesting Defaults and Controlsswyx: simple question I wanted to gather out front.Some prs are way smaller, [00:03:00] like just copy change. Does it always do the video or is it sometimes,Jonas: Sometimes.swyx: Okay. So what's the judgment?Jonas: The model does it? So we we do some default prompting with sort. What types of changes to test? There's a slash command that people can do called slash no test, where if you do that, the model will not test,swyx: but the default is test.Jonas: The default is to be calibrated. So we tell it don't test, very simple copy changes, but test like more complex things. And then users can also write their agents.md and specify like this type of, if you're editing this subpart of my mono repo, never tested ‘cause that won't work or whatever.Videos and Remote ControlJonas: So pillar one is the model actually testing Pillar two is the model coming back with a video of what it did.We have found that in this new world where agents can end-to-end, write much more code, reviewing the code is one of these new bottlenecks that crop up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much, much easier to start with than glancing at [00:04:00] some giant diff.And so typically you kick one off you, it's done you come back and the first thing that you would do is watch this video. So this is a, video of it. In this case I wanted a tool tip over this button. And so it went and showed me what that looks like in, in this video that I think here, it actually used a gallery.So sometimes it will build storybook type galleries where you can see like that component in action. And so that's pillar two is like these demo videos of what it built. And then pillar number three is I have full remote control access to this vm. So I can go heat in here. I can hover things, I can type, I have full control.And same thing for the terminal. I have full access. And so that is also really useful because sometimes the video is like all you need to see. And oftentimes by the way, the video's not perfect, the video will show you, is this worth either merging immediately or oftentimes is this worth iterating with to get it to that final stage where I am ready to merge in.So I can go through some other examples where the first video [00:05:00] wasn't perfect, but it gave me confidence that we were on the right track and two or three follow-ups later, it was good to go. And then I also have full access here where some things you just wanna play around with. You wanna get a feel for what is this and there's no substitute to a live preview.And the VNC kind of VM remote access gives you that.swyx: Amazing What, sorry? What is VN. AndJonas: just the remote desktop. Remote desktop. Yeah.swyx: Sam, any other details that you always wanna call out?Samantha: Yeah, for me the videos have been super helpful. I would say, especially in cases where a common problem for me with agents and cloud agents beforehand was almost like under specification in my requests where our plan mode and going really back and forth and getting detailed implementation spec is a way to reduce the risk of under specification, but then similar to how human communication breaks down over time, I feel like you have this risk where it's okay, when I pull down, go to the triple of pulling down and like running this branch locally, I'm gonna see that, like I said, this should be a toggle and you have a checkbox and like, why didn't you get that detail?And having the video up front just [00:06:00] has that makes that alignment like you're talking about a shared artifact with the agent. Very clear, which has been just super helpful for me.Jonas: I can quickly run through some other Yes. Examples.Meta Agents and More DemosJonas: So this is a very front end heavy one. So one question I wasswyx: gonna say, is this only for frontJonas: end?Exactly. One question you might have is this only for front end? So this is another example where the thing I wanted it to implement was a better error message for saving secrets. So the cloud agents support adding secrets, that's part of what it needs to access certain systems. Part of onboarding that is giving access.This is cloud is working onswyx: cloud agents. Yes.Jonas: So this is a fun thing isSamantha: it can get super meta. ItJonas: can get super meta, it can start its own cloud agents, it can talk to its own cloud agents. Sometimes it's hard to wrap your mind around that. We have disabled, it's cloud agents starting more cloud agents. So we currently disallow that.Someday you might. Someday we might. Someday we might. So this actually was mostly a backend change in terms of the error handling here, where if the [00:07:00] secret is far too large, it would oh, this is actually really cool. Wow. That's the Devrel tools. That's the Devrel tools. So if the secret is far too large, we.Allow secrets above a certain size. We have a size limit on them. And the error message there was really bad. It was just some generic failed to save message. So I was like, Hey, we wanted an error message. So first cool thing it did here, zero prompting on how to test this. Instead of typing out the, like a character 5,000 times to hit the limit, it opens Devrel tools, writes js, or to paste into the input 5,000 characters of the letter A and then hit save, closes the Devrel tools, hit save and gets this new gets the new error message.So that looks like the video actually cut off, but here you can see the, here you can see the screenshot of the of the error message. What, so that is like frontend backend end-to-end feature to, to get that,swyx: yeah.Jonas: Andswyx: And you just need a full vm, full computer run everything.Okay. Yeah.Jonas: Yeah. So we've had versions of this. This is one of the auto tab lessons where we started that in 2022. [00:08:00] No, in 2023. And at the time it was like browser use, DOM, like all these different things. And I think we ended up very sort of a GI pilled in the sense that just give the model pixels, give it a box, a brain in a box is what you want and you want to remove limitations around context and capabilities such that the bottleneck should be the intelligence.And given how smart models are today, that's a very far out bottleneck. And so giving it its full VM and having it be onboarded with Devrel X set up like a human would is just been for us internally a really big step change in capability.swyx: Yeah I would say, let's call it a year ago the models weren't even good enough to do any of this stuff.SoSamantha: even six months ago. Yeah.swyx: So yeah what people have told me is like round about Sonder four fire is when this started being good enough to just automate fully by pixel.Jonas: Yeah, I think it's always a question of when is good enough. I think we found in particular with Opus 4 5, 4, 6, and Codex five three, that those were additional step [00:09:00] changes in the autonomy grade capabilities of the model to just.Go off and figure out the details and come back when it's done.swyx: I wanna appreciate a couple details. One 10 Stack Router. I see it. Yeah. I'm a big fan. Do you know any, I have to name the 10 Stack.Jonas: No.swyx: This just a random lore. Some buddy Sue Tanner. My and then the other thing if you switch back to the video.Jonas: Yeah.swyx: I wanna shout out this thing. Probably Sam did it. I don't knowJonas: the chapters.swyx: What is this called? Yeah, this is called Chapters. Yeah. It's like a Vimeo thing. I don't know. But it's so nice the design details, like the, and obviously a company called Cursor has to have a beautiful cursorSamantha: and it isswyx: the cursor.Samantha: Cursor.swyx: You see it branded? It's the cursor. Cursor, yeah. Okay, cool. And then I was like, I complained to Evan. I was like, okay, but you guys branded everything but the wallpaper. And he was like, no, that's a cursor wallpaper. I was like, what?Samantha: Yeah. Rio picked the wallpaper, I think. Yeah. The video.That's probably Alexi and yeah, a few others on the team with the chapters on the video. Matthew Frederico. There's been a lot of teamwork on this. It's a huge effort.swyx: I just, I like design details.Samantha: Yeah.swyx: And and then when you download it adds like a little cursor. Kind of TikTok clip. [00:10:00] Yes. Yes.So it's to make it really obvious is from Cursor,Jonas: we did the TikTok branding at the end. This was actually in our launch video. Alexi demoed the cloud agent that built that feature. Which was funny because that was an instance where one of the things that's been a consequence of having these videos is we use best of event where you run head to head different models on the same prompt.We use that a lot more because one of the complications with doing that before was you'd run four models and they would come back with some giant diff, like 700 lines of code times four. It's what are you gonna do? You're gonna review all that's horrible. But if you come back with four 22nd videos, yeah, I'll watch four 22nd videos.And then even if none of them is perfect, you can figure out like, which one of those do you want to iterate with, to get it over the line. Yeah. And so that's really been really fun.Bug Repro WorkflowJonas: Here's another example. That's we found really cool, which is we've actually turned since into a slash command as well slash [00:11:00] repro, where for bugs in particular, the model of having full access to the to its own vm, it can first reproduce the bug, make a video of the bug reproducing, fix the bug, make a video of the bug being fixed, like doing the same pattern workflow with obviously the bug not reproducing.And that has been the single category that has gone from like these types of bugs, really hard to reproduce and pick two tons of time locally, even if you try a cloud agent on it. Are you confident it actually fixed it to when this happens? You'll merge it in 90 seconds or something like that.So this is an example where, let me see if this is the broken one or the, okay, this is the fixed one. Okay. So we had a bug on cursor.com/agents where if you would attach images where remove them. Then still submit your prompt. They would actually still get attached to the prompt. Okay. And so here you can see Cursor is using, its full desktop by the way.This is one of the cases where if you just do, browse [00:12:00] use type stuff, you'll have a bad time. ‘cause now it needs to upload files. Like it just uses its native file viewer to do that. And so you can see here it's uploading files. It's going to submit a prompt and then it will go and open up. So this is the meta, this is cursor agent, prompting cursor agent inside its own environment.And so you can see here bug, there's five images attached, whereas when it's submitted, it only had one image.swyx: I see. Yeah. But you gotta enable that if you're gonna use cur agent inside cur.Jonas: Exactly. And so here, this is then the after video where it went, it does the same thing. It attaches images, removes, some of them hit send.And you can see here, once this agent is up, only one of the images is left in the attachments. Yeah.swyx: Beautiful.Jonas: Okay. So easy merge.swyx: So yeah. When does it choose to do this? Because this is an extra step.Jonas: Yes. I think I've not done a great job yet of calibrating the model on when to reproduce these things.Yeah. Sometimes it will do it of its own accord. Yeah. We've been conservative where we try to have it only do it when it's [00:13:00] quite sure because it does add some amount of time to how long it takes it to work on it. But we also have added things like the slash repro command where you can just do, fix this bug slash repro and then it will know that it should first make you a video of it actually finding and making sure it can reproduce the bug.swyx: Yeah. Yeah. One sort of ML topic this ties into is reward hacking, where while you write test that you update only pass. So first write test, it shows me it fails, then make you test pass, which is a classic like red green.Jonas: Yep.swyx: LikeJonas: A-T-D-D-T-D-Dswyx: thing.No, very cool. Was that the last demo? Is thereJonas: Yeah.Anything I missed on the demos or points that you think? I think thatSamantha: covers it well. Yeah.swyx: Cool. Before we stop the screen share, can you gimme like a, just a tour of the slash commands ‘cause I so God ready. Huh, what? What are the good ones?Samantha: Yeah, we wanna increase discoverability around this too.I think that'll be like a future thing we work on. Yeah. But there's definitely a lot of good stuff nowJonas: we have a lot of internal ones that I think will not be that interesting. Here's an internal one that I've made. I don't know if anyone else at Cursor uses this one. Fix bb.Samantha: I've never heard of it.Jonas: Yeah.[00:14:00]Fix Bug Bot. So this is a thing that we want to integrate more tightly on. So you made it forswyx: yourself.Jonas: I made this for myself. It's actually available to everyone in the team, but yeah, no one knows about it. But yeah, there will be Bug bot comments and so Bug Bot has a lot of cool things. We actually just launched Bug Bot Auto Fix, where you can click a button and or change a setting and it will automatically fix its own things, and that works great in a bunch of cases.There are some cases where having the context of the original agent that created the PR is really helpful for fixing the bugs, because it might be like, oh, the bug here is that this, is a regression and actually you meant to do something more like that. And so having the original prompt and all of the context of the agent that worked on it, and so here I could just do, fix or we used to be able to do fixed PB and it would do that.No test is another one that we've had. Slash repro is in here. We mentioned that one.Samantha: One of my favorites is cloud agent diagnosis. This is one that makes heavy use of the Datadog MCP. Okay. And I [00:15:00] think Nick and David on our team wrote, and basically if there is a problem with a cloud agent we'll spin up a bunch of subs.Like a singleswyx: instance.Samantha: Yeah. We'll take the ideas and argument and spin up a bunch of subagents using the Datadog MCP to explore the logs and find like all of the problems that could have happened with that. It takes the debugging time, like from potentially you can do quick stuff quickly with the Datadog ui, but it takes it down to, again, like a single agent call as opposed to trolling through logs yourself.Jonas: You should also talk about the stuff we've done with transcripts.Samantha: Yes. Also so basically we've also done some things internally. There'll be some versions of this as we ship publicly soon, where you can spit up an agent and give it access to another agent's transcript to either basically debug something that happened.So act as an external debugger. I see. Or continue the conversation. Almost like forking it.swyx: A transcript includes all the chain of thought for the 11 minutes here. 45 minutes there.Samantha: Yeah. That way. Exactly. So basically acting as a like secondary agent that debugs the first, so we've started to push more andswyx: they're all the same [00:16:00] code.It is just the different prompts, but the sa the same.Samantha: Yeah. So basically same cloud agent infrastructure and then same harness. And then like when we do things like include, there's some extra infrastructure that goes into piping in like an external transcript if we include it as an attachment.But for things like the cloud agent diagnosis, that's mostly just using the Datadog MCP. ‘Cause we also launched CPS along with along with this cloud agent launch, launch support for cloud agent cps.swyx: Oh, that was drawn out.Jonas: We won't, we'll be doing a bigger marketing moment for it next week, but, and you can now use CPS andswyx: People will listen to it as well.Yeah,Jonas: they'llSamantha: be ahead of the third. They'll be ahead. And I would I actually don't know if the Datadog CP is like publicly available yet. I realize this not sure beta testing it, but it's been one of my favorites to use. Soswyx: I think that one's interesting for Datadog. ‘cause Datadog wants to own that site.Interesting with Bits. I don't know if you've tried bits.Samantha: I haven't tried bits.swyx: Yeah.Jonas: That's their cloud agentswyx: product. Yeah. Yeah. They want to be like we own your logs and give us our, some part of the, [00:17:00] self-healing software that everyone wants. Yeah. But obviously Cursor has a strong opinion on coding agents and you, you like taking away from the which like obviously you're going to do, and not every company's like Cursor, but it's interesting if you're a Datadog, like what do you do here?Do you expose your logs to FDP and let other people do it? Or do you try to own that it because it's extra business for you? Yeah. It's like an interesting one.Samantha: It's a good question. All I know is that I love the Datadog MCP,Jonas: And yeah, it is gonna be no, no surprise that people like will demand it, right?Samantha: Yeah.swyx: It's, it's like anysystemswyx: of record company like this, it's like how much do you give away? Cool. I think that's that for the sort of cloud agents tour. Cool. And we just talk about like cloud agents have been when did Kirsten loves cloud agents? Do you know, in JuneJonas: last year.swyx: June last year. So it's been slowly develop the thing you did, like a bunch of, like Michael did a post where himself, where he like showed this chart of like ages overtaking tap. And I'm like, wow, this is like the biggest transition in code.Jonas: Yeah.swyx: Like in, in [00:18:00] like the last,Jonas: yeah. I think that kind of got turned out.Yeah. I think it's a very interest,swyx: not at all. I think it's been highlighted by our friend Andre Kati today.Jonas: Okay.swyx: Talk more about it. What does it mean? Yeah. Is I just got given like the cursor tab key.Jonas: Yes. Yes.swyx: That's that'sSamantha: cool.swyx: I know, but it's gonna be like put in a museum.Jonas: It is.Samantha: I have to say I haven't used tab a little bit myself.Jonas: Yeah. I think that what it looks like to code with AI code generally creates software, even if you want to go higher level. Is changing very rapidly. No, not a hot take, but I think from our vendor's point at Cursor, I think one of the things that is probably underappreciated from the outside is that we are extremely self-aware about that fact and Kerscher, got its start in phase one, era one of like tab and auto complete.And that was really useful in its time. But a lot of people start looking at text files and editing code, like we call it hand coding. Now when you like type out the actual letters, it'sswyx: oh that's cute.Jonas: Yeah.swyx: Oh that's cute.Jonas: You're so boomer. So boomer. [00:19:00] And so that I think has been a slowly accelerating and now in the last few months, rapidly accelerating shift.And we think that's going to happen again with the next thing where the, I think some of the pains around tab of it's great, but I actually just want to give more to the agent and I don't want to do one tab at a time. I want to just give it a task and it goes off and does a larger unit of work and I can.Lean back a little bit more and operate at that higher level of abstraction that's going to happen again, where it goes from agents handing you back diffs and you're like in the weeds and giving it, 32nd to three minute tasks, to, you're giving it, three minute to 30 minute to three hour tasks and you're getting back videos and trying out previews rather than immediately looking at diffs every single time.swyx: Yeah. Anything to add?Samantha: One other shift that I've noticed as our cloud agents have really taken off internally has been a shift from primarily individually driven development to almost this collaborative nature of development for us, slack is actually almost like a development on [00:20:00] Id basically.So Iswyx: like maybe don't even build a custom ui, like maybe that's like a debugging thing, but actually it's that.Samantha: I feel like, yeah, there's still so much to left to explore there, but basically for us, like Slack is where a lot of development happens. Like we will have these issue channels or just like this product discussion channels where people are always at cursing and that kicks off a cloud agent.And for us at least, we have team follow-ups enabled. So if Jonas kicks off at Cursor in a thread, I can follow up with it and add more context. And so it turns into almost like a discussion service where people can like collaborate on ui. Oftentimes I will kick off an investigation and then sometimes I even ask it to get blame and then tag people who should be brought in. ‘cause it can tag people in Slack and then other people will comeswyx: in, can tag other people who are not involved in conversation. Yes. Can just do at Jonas if say, was talking to,Samantha: yeah.swyx: That's cool. You should, you guys should make a big good deal outta that.Samantha: I know. It's a lot to, I feel like there's a lot more to do with our slack surface area to show people externally. But yeah, basically like it [00:21:00] can bring other people in and then other people can also contribute to that thread and you can end up with a PR again, with the artifacts visible and then people can be like, okay, cool, we can merge this.So for us it's like the ID is almost like moving into Slack in some ways as well.swyx: I have the same experience with, but it's not developers, it's me. Designer salespeople.Samantha: Yeah.swyx: So me on like technical marketing, vision, designer on design and then salespeople on here's the legal source of what we agreed on.And then they all just collaborate and correct. The agents,Jonas: I think that we found when these threads is. The work that is left, that the humans are discussing in these threads is the nugget of what is actually interesting and relevant. It's not the boring details of where does this if statement go?It's do we wanna ship this? Is this the right ux? Is this the right form factor? Yeah. How do we make this more obvious to the user? It's like those really interesting kind of higher order questions that are so easy to collaborate with and leave the implementation to the cloud agent.Samantha: Totally. And no more discussion of am I gonna do this? Are you [00:22:00] gonna do this cursor's doing it? You just have to decide. You like it.swyx: Sometimes the, I don't know if there's a, this probably, you guys probably figured this out already, but since I, you need like a mute button. So like cursor, like we're going to take this offline, but still online.But like we need to talk among the humans first. Before you like could stop responding to everything.Jonas: Yeah. This is a design decision where currently cursor won't chime in unless you explicitly add Mention it. Yeah. Yeah.Samantha: So it's not always listening.Yeah.Jonas: I can see all the intermediate messages.swyx: Have you done the recursive, can cursor add another cursor or spawn another cursor?Samantha: Oh,Jonas: we've done some versions of this.swyx: Because, ‘cause it can add humans.Jonas: Yes. One of the other things we've been working on that's like an implication of generating the code is so easy is getting it to production is still harder than it should be.And broadly, you solve one bottleneck and three new ones pop up. Yeah. And so one of the new bottlenecks is getting into production and we have a like joke internally where you'll be talking about some feature and someone says, I have a PR for that. Which is it's so easy [00:23:00] to get to, I a PR for that, but it's hard still relatively to get from I a PR for that to, I'm confident and ready to merge this.And so I think that over the coming weeks and months, that's a thing that we think a lot about is how do we scale up compute to that pipeline of getting things from a first draft An agent did.swyx: Isn't that what Merge isn't know what graphite's for, likeJonas: graphite is a big part of that. The cloud agent testingswyx: Is it fully integrated or still different companiesJonas: working on I think we'll have more to share there in the future, but the goal is to have great end-to-end experience where Cursor doesn't just help you generate code tokens, it helps you create software end-to-end.And so review is a big part of that, that I think especially as models have gotten much better at writing code, generating code, we've felt that relatively crop up more,swyx: sorry this is completely unplanned, but like there I have people arguing one to you need ai. To review ai and then there is another approach, thought school of thought where it's no, [00:24:00] reviews are dead.Like just show me the video. It's it like,Samantha: yeah. I feel again, for me, the video is often like alignment and then I often still wanna go through a code review process.swyx: Like still look at the files andSamantha: everything. Yeah. There's a spectrum of course. Like the video, if it's really well done and it does like fully like test everything, you can feel pretty competent, but it's still helpful to, to look at the code.I make hep pay a lot of attention to bug bot. I feel like Bug Bot has been a great really highly adopted internally. We often like, won't we tell people like, don't leave bug bot comments unaddressed. ‘cause we have such high confidence in it. So people always address their bug bot comments.Jonas: Once you've had two cases where you merged something and then you went back later, there was a bug in it, you merged, you went back later and you were like, ah, bug Bot had found that I should have listened to Bug Bot.Once that happens two or three times, you learn to wait for bug bot.Samantha: Yeah. So I think for us there's like that code level review where like it's looking at the actual code and then there's like the like feature level review where you're looking at the features. There's like a whole number of different like areas.There'll probably eventually be things like performance level review, security [00:25:00] review, things like that where it's like more more different aspects of how this feature might affect your code base that you want to potentially leverage an agent to help with.Jonas: And some of those like bug bot will be synchronous and you'll typically want to wait on before you merge.But I think another thing that we're starting to see is. As with cloud agents, you scale up this parallelism and how much code you generate. 10 person startups become, need the Devrel X and pipelines that a 10,000 person company used to need. And that looks like a lot of the things I think that 10,000 person companies invented in order to get that volume of software to production safely.So that's things like, release frequently or release slowly, have different stages where you release, have checkpoints, automated ways of detecting regressions. And so I think we're gonna need stacks merg stack diffs merge queues. Exactly. A lot of those things are going to be importantswyx: forward with.I think the majority of people still don't know what stack stacks are. And I like, I have many friends in Facebook and like I, I'm pretty friendly with graphite. I've just, [00:26:00] I've never needed it ‘cause I don't work on that larger team and it's just like democratization of no, only here's what we've already worked out at very large scale and here's how you can, it benefits you too.Like I think to me, one of the beautiful things about GitHub is that. It's actually useful to me as an individual solo developer, even though it's like actually collaboration software.Jonas: Yep.swyx: And I don't think a lot of Devrel tools have figured that out yet. That transition from like large down to small.Jonas: Yeah. Kers is probably an inverse story.swyx: This is small down toJonas: Yeah. Where historically Kers share, part of why we grew so quickly was anyone on the team could pick it up and in fact people would pick it up, on the weekend for their side project and then bring it into work. ‘cause they loved using it so much.swyx: Yeah.Jonas: And I think a thing that we've started working on a lot more, not us specifically, but as a company and other folks at Cursor, is making it really great for teams and making it the, the 10th person that starts using Cursor in a team. Is immediately set up with things like, we launched Marketplace recently so other people can [00:27:00] configure what CPS and skills like plugins.So skills and cps, other people can configure that. So that my cursor is ready to go and set up. Sam loves the Datadog, MCP and Slack, MCP you've also been using a lot butSamantha: also pre-launch, but I feel like it's so good.Jonas: Yeah, my cursor should be configured if Sam feels strongly that's just amazing and required.swyx: Is it automatically shared or you have to go and.Jonas: It depends on the MCP. So some are obviously off per user. Yeah. And so Sam can't off my cursor with my Slack MCP, but some are team off and those can be set up by admins.swyx: Yeah. Yeah. That's cool. Yeah, I think, we had a man on the pod when cursor was five people, and like everyone was like, okay, what's the thing?And then it's usually something teams and org and enterprise, but it's actually working. But like usually at that stage when you're five, when you're just a vs. Code fork it's like how do you get there? Yeah. Will people pay for this? People do pay for it.Jonas: Yeah. And I think for cloud agents, we expect.[00:28:00]To have similar kind of PLG things where I think off the bat we've seen a lot of adoption with kind of smaller teams where the code bases are not quite as complex to set up. Yes. If you need some insane docker layer caching thing for builds not to take two hours, that's going to take a little bit longer for us to be able to support that kind of infrastructure.Whereas if you have front end backend, like one click agents can install everything that they need themselves.swyx: This is a good chance for me to just ask some technical sort of check the box questions. Can I choose the size of the vm?Jonas: Not yet. We are planning on adding that. Weswyx: have, this is obviously you want like LXXL, whatever, right?Like it's like the Amazon like sort menu.Jonas: Yes, exactly. We'll add that.swyx: Yeah. In some ways you have to basically become like a EC2, almost like you rent a box.Jonas: You rent a box. Yes. We talk a lot about brain in a box. Yeah. So cursor, we want to be a brain in a box,swyx: but is the mental model different? Is it more serverless?Is it more persistent? Is. Something else.Samantha: We want it to be a bit persistent. The desktop should be [00:29:00] something you can return to af even after some days. Like maybe you go back, they're like still thinking about a feature for some period of time. So theswyx: full like sus like suspend the memory and bring it back and then keep going.Samantha: Exactly.swyx: That's an interesting one because what I actually do want, like from a manna and open crawl, whatever, is like I want to be able to log in with my credentials to the thing, but not actually store it in any like secret store, whatever. ‘cause it's like this is the, my most sensitive stuff.Yeah. This is like my email, whatever. And just have it like, persist to the image. I don't know how it was hood, but like to rehydrate and then just keep going from there. But I don't think a lot of infra works that way. A lot of it's stateless where like you save it to a docker image and then it's only whatever you can describe in a Docker file and that's it.That's the only thing you can cl multiple times in parallel.Jonas: Yeah. We have a bunch of different ways of setting them up. So there's a dockerfile based approach. The main default way is actually snapshottingswyx: like a Linux vmJonas: like vm, right? You run a bunch of install commands and then you snapshot more or less the file system.And so that gets you set up for everything [00:30:00] that you would want to bring a new VM up from that template basically.swyx: Yeah.Jonas: And that's a bit distinct from what Sam was talking about with the hibernating and re rehydrating where that is a full memory snapshot as well. So there, if I had like the browser open to a specific page and we bring that back, that page will still be there.swyx: Was there any discussion internally and just building this stuff about every time you shoot a video it's actually you show a little bit of the desktop and the browser and it's not necessary if you just show the browser. If, if you know you're just demoing a front end application.Why not just show the browser, right? Like it Yeah,Samantha: we do have some panning and zooming. Yeah. Like it can decide that when it's actually recording and cutting the video to highlight different things. I think we've played around with different ways of segmenting it and yeah. There's been some different revs on it for sure.Jonas: Yeah. I think one of the interesting things is the version that you see now in cursor.com actually is like half of what we had at peak where we decided to unshift or unshipped quite a few things. So two of the interesting things to talk about, one is directly an answer to your [00:31:00] question where we had native browser that you would have locally, it was basically an iframe that via port forwarding could load the URL could talk to local host in the vm.So that gets you basically, so inswyx: your machine's browser,likeJonas: in your local browser? Yeah. You would go to local host 4,000 and that would get forwarded to local host 4,000 in the VM via port forward. We unshift that like atswyx: Eng Rock.Jonas: Like an Eng Rock. Exactly. We unshift that because we felt that the remote desktop was sufficiently low latency and more general purpose.So we build Cursor web, but we also build Cursor desktop. And so it's really useful to be able to have the full spectrum of things. And even for Cursor Web, as you saw in one of the examples, the agent was uploading files and like I couldn't upload files and open the file viewer if I only had access to the browser.And we've thought a lot about, this might seem funny coming from Cursor where we started as this, vs. Code Fork and I think inherited a lot of amazing things, but also a lot [00:32:00] of legacy UI from VS Code.Minimal Web UI SurfacesJonas: And so with the web UI we wanted to be very intentional about keeping that very minimal and exposing the right sum of set of primitive sort of app surfaces we call them, that are shared features of that cloud.Environment that you and the agent both use. So agent uses desktop and controls it. I can use desktop and controlled agent runs terminal commands. I can run terminal commands. So that's how our philosophy around it. The other thing that is maybe interesting to talk about that we unshipped is and we may, both of these things we may reship and decide at some point in the future that we've changed our minds on the trade offs or gotten it to a point where, putswyx: it out there.Let users tell you they want it. Exactly. Alright, fine.Why No File EditorJonas: So one of the other things is actually a files app. And so we used to have the ability at one point during the process of testing this internally to see next to, I had GID desktop and terminal on the right hand side of the tab there earlier to also have a files app where you could see and edit files.And we actually felt that in some [00:33:00] ways, by restricting and limiting what you could do there, people would naturally leave more to the agent and fall into this new pattern of delegating, which we thought was really valuable. And there's currently no way in Cursor web to edit these files.swyx: Yeah. Except you like open up the PR and go into GitHub and do the thing.Jonas: Yeah.swyx: Which is annoying.Jonas: Just tell the agent,swyx: I have criticized open AI for this. Because Open AI is Codex app doesn't have a file editor, like it has file viewer, but isn't a file editor.Jonas: Do you use the file viewer a lot?swyx: No. I understand, but like sometimes I want it, the one way to do it is like freaking going to no, they have a open in cursor button or open an antigravity or, opening whatever and people pointed that.So I was, I was part of the early testers group people pointed that and they were like, this is like a design smell. It's like you actually want a VS. Code fork that has all these things, but also a file editor. And they were like, no, just trust us.Jonas: Yeah. I think we as Cursor will want to, as a product, offer the [00:34:00] whole spectrum and so you want to be able to.Work at really high levels of abstraction and double click and see the lowest level. That's important. But I also think that like you won't be doing that in Slack. And so there are surfaces and ways of interacting where in some cases limiting the UX capabilities makes for a cleaner experience that's more simple and drives people into these new patterns where even locally we kicked off joking about this.People like don't really edit files, hand code anymore. And so we want to build for where that's going and not where it's beenswyx: a lot of cool stuff. And Okay. I have a couple more.Full Stack Hosting Debateswyx: So observations about the design elements about these things. One of the things that I'm always thinking about is cursor and other peers of cursor start from like the Devrel tools and work their way towards cloud agents.Other people, like the lovable and bolts of the world start with here's like the vibe code. Full cloud thing. They were already cloud edges before anyone else cloud edges and we will give you the full deploy platform. So we own the whole loop. We own all the infrastructure, we own, we, we have the logs, we have the the live site, [00:35:00] whatever.And you can do that cycle cursor doesn't own that cycle even today. You don't have the versal, you don't have the, you whatever deploy infrastructure that, that you're gonna have, which gives you powers because anyone can use it. And any enterprise who, whatever you infra, I don't care. But then also gives you limitations as to how much you can actually fully debug end to end.I guess I'm just putting out there that like is there a future where there's like full stack cursor where like cursor apps.com where like I host my cursor site this, which is basically a verse clone, right? I don't know.Jonas: I think that's a interesting question to be asking, and I think like the logic that you laid out for how you would get there is logic that I largely agree with.swyx: Yeah. Yeah.Jonas: I think right now we're really focused on what we see as the next big bottleneck and because things like the Datadog MCP exist, yeah. I don't think that the best way we can help our customers ship more software. Is by building a hosting solution right now,swyx: by the way, these are things I've actually discussed with some of the companies I just named.Jonas: Yeah, for sure. Right now, just this big bottleneck is getting the code out there and also [00:36:00] unlike a lovable in the bolt, we focus much more on existing software. And the zero to one greenfield is just a very different problem. Imagine going to a Shopify and convincing them to deploy on your deployment solution.That's very different and I think will take much longer to see how that works. May never happen relative to, oh, it's like a zero to one app.swyx: I'll say. It's tempting because look like 50% of your apps are versal, superb base tailwind react it's the stack. It's what everyone does.So I it's kinda interesting.Jonas: Yeah.Model Choice and Auto Routingswyx: The other thing is the model select dying. Right now in cloud agents, it's stuck down, bottom left. Sure it's Codex High today, but do I care if it's suddenly switched to Opus? Probably not.Samantha: We definitely wanna give people a choice across models because I feel like it, the meta change is very frequently.I was a big like Opus 4.5 Maximalist, and when codex 5.3 came out, I hard, hard switch. So that's all I use now.swyx: Yeah. Agreed. I don't know if, but basically like when I use it in Slack, [00:37:00] right? Cursor does a very good job of exposing yeah. Cursors. If people go use it, here's the model we're using.Yeah. Here's how you switch if you want. But otherwise it's like extracted away, which is like beautiful because then you actually, you should decide.Jonas: Yeah, I think we want to be doing more with defaults.swyx: Yeah.Jonas: Where we can suggest things to people. A thing that we have in the editor, the desktop app is auto, which will route your request and do things there.So I think we will want to do something like that for cloud agents as well. We haven't done it yet. And so I think. We have both people like Sam, who are very savvy and want know exactly what model they want, and we also have people that want us to pick the best model for them because we have amazing people like Sam and we, we are the experts.Yeah. We have both the traffic and the internal taste and experience to know what we think is best.swyx: Yeah. I have this ongoing pieces of agent lab versus model lab. And to me, cursor and other companies are example of an agent lab that is, building a new playbook that is different from a model lab where it's like very GP heavy Olo.So obviously has a research [00:38:00] team. And my thesis is like you just, every agent lab is going to have a router because you're going to be asked like, what's what. I don't keep up to every day. I'm not a Sam, I don't keep up every day for using you as sample the arm arbitrator of taste. Put me on CRI Auto.Is it free? It's not free.Jonas: Auto's not free, but there's different pricing tiers. Yeah.swyx: Put me on Chris. You decide from me based on all the other people you know better than me. And I think every agent lab should basically end up doing this because that actually gives you extra power because you like people stop carrying or having loyalty with one lab.Jonas: Yeah.Best Of N and Model CouncilsJonas: Two other maybe interesting things that I don't know how much they're on your radar are one the best event thing we mentioned where running different models head to head is actually quite interesting becauseswyx: which exists in cursor.Jonas: That exists in cur ID and web. So the problem is where do you run them?swyx: Okay.Jonas: And so I, I can share my screen if that's interesting. Yeahinteresting.swyx: Yeah. Yeah. Obviously parallel agents, very popal.Jonas: Yes, exactly. Parallel agentsswyx: in you mind. Are they the same thing? Best event and parallel agents? I don't want to [00:39:00] put words in your mouth.Jonas: Best event is a subset of parallel agents where they're running on the same prompt.That would be my answer. So this is what that looks like. And so here in this dropdown picker, I can just select multiple models.swyx: Yeah.Jonas: And now if I do a prompt, I'm going to do something silly. I am running these five models.swyx: Okay. This is this fake clone, of course. The 2.0 yeah.Jonas: Yes, exactly. But they're running so the cursor 2.0, you can do desktop or cloud.So this is cloud specifically where the benefit over work trees is that they have their own VMs and can run commands and won't try to kill ports that the other one is running. Which are some of the pains. These are allswyx: called work trees?Jonas: No, these are all cloud agents with their own VMs.swyx: Okay. ButJonas: When you do it locally, sometimes people do work trees and that's been the main way that people have set out parallel so far.I've gotta say.swyx: That's so confusing for folks.Jonas: Yeah.swyx: No one knows what work trees are.Jonas: Exactly. I think we're phasing out work trees.swyx: Really.Jonas: Yeah.swyx: Okay.Samantha: But yeah. And one other thing I would say though on the multimodel choice, [00:40:00] so this is another experiment that we ran last year and the decide to ship at that time but may come back to, and there was an interesting learning that's relevant for, these different model providers. It was something that would run a bunch of best of ends but then synthesize and basically run like a synthesizer layer of models. And that was other agents that would take LM Judge, but one that was also agentic and could write code. So it wasn't just picking but also taking the learnings from two models or, and models that it was looking at and writing a new diff.And what we found was that at the time at least, there were strengths to using models from different model providers as the base level of this process. Like basically you could get almost like a synergistic output that was better than having a very unified, like bottom model tier. So it was really interesting ‘cause it's like potentially, even though even in the future when you have like maybe one model as ahead of the other for a little bit, there could be some benefit from having like multiple top tier models involved in like a [00:41:00] model swarm or whatever agent Swarm that you're doing, that they each have strengths and weaknesses.Yeah.Jonas: Andre called this the council, right?Samantha: Yeah, exactly. We actually, oh, that's another internal command we have that Ian wrote slash council. Oh, and they some, yeah.swyx: Yes. This idea is in various forms everywhere. And I think for me, like for me, the productization of it, you guys have done yeah, like this is very flexible, but.If I were to add another Yeah, what your thing is on here it would be too much. I what, let's say,Samantha: Ideally it's all, it's something that the user can just choose and it all happens under the hood in a way where like you just get the benefit of that process at the end and better output basically, but don't have to get too lost in the complexity of judging along the way.Jonas: Okay.Subagents for ContextJonas: Another thing on the many agents, on different parallel agents that's interesting is an idea that's been around for a while as well that has started working recently is subagents. And so this is one other way to get agents of the different prompts and different goals and different models, [00:42:00] different vintages to work together.Collaborate and delegate.swyx: Yeah. I'm very like I like one of my, I always looking for this is the year of the blah, right? Yeah. I think one of the things on the blahs is subs. I think this is of but I haven't used them in cursor. Are they fully formed or how do I honestly like an intro because do I form them from new every time?Do I have fixed subagents? How are they different for slash commands? There's all these like really basic questions that no one stops to answer for people because everyone's just like too busy launching. We have toSamantha: honestly, you could, you can see them in cursor now if you just say spin up like 50 subagents to, so cursor definesswyx: what Subagents.Yeah.Samantha: Yeah. So basically I think I shouldn't speak for the whole subagents team. This is like a different team that's been working on this, but our thesis or thing that we saw internally is that like they're great for context management for kind of long running threads, or if you're trying to just throw more compute at something.We have strongly used, almost like a generic task interface where then the main agent can define [00:43:00] like what goes into the subagent. So if I say explore my code base, it might decide to spin up an explore subagent and or might decide to spin up five explore subagent.swyx: But I don't get to set what those subagent are, right?It's all defined by a model.Samantha: I think. I actually would have to refresh myself on the sub agent interface.Jonas: There are some built-in ones like the explore subagent is free pre-built. But you can also instruct the model to use other subagents and then it will. And one other example of a built-in subagent is I actually just kicked one off in cursor and I can show you what that looks like.swyx: Yes. Because I tried to do this in pure prompt space.Jonas: So this is the desktop app? Yeah. Yeah. And that'sswyx: all you need to do, right? Yeah.Jonas: That's all you need to do. So I said use a sub agent to explore and I think, yeah, so I can even click in and see what the subagent is working on here. It ran some fine command and this is a composer under the hood.Even though my main model is Opus, it does smart routing to take, like in this instance the explorer sort of requires reading a ton of things. And so a faster model is really useful to get an [00:44:00] answer quickly, but that this is what subagent look like. And I think we wanted to do a lot more to expose hooks and ways for people to configure these.Another example of a cus sort of builtin subagent is the computer use subagent in the cloud agents, where we found that those trajectories can be long and involve a lot of images obviously, and execution of some testing verification task. We wanted to use that models that are particularly good at that.So that's one reason to use subagents. And then the other reason to use subagents is we want contexts to be summarized reduced down at a subagent level. That's a really neat boundary at which to compress that rollout and testing into a final message that agent writes that then gets passed into the parent rather than having to do some global compaction or something like that.swyx: Awesome. Cool. While we're in the subagents conversation, I can't do a cursor conversation and not talk about listen stuff. What is that? What is what? He built a browser. He built an os. Yes. And he [00:45:00] experimented with a lot of different architectures and basically ended up reinventing the software engineer org chart.This is all cool, but what's your take? What's, is there any hole behind the side? The scenes stories about that kind of, that whole adventure.Samantha: Some of those experiments have found their way into a feature that's available in cloud agents now, the long running agent mode internally, we call it grind mode.And I think there's like some hint of grind mode accessible in the picker today. ‘cause you can do choose grind until done. And so that was really the result of experiments that Wilson started in this vein where he I think the Ralph Wigga loop was like floating around at the time, but it was something he also independently found and he was experimenting with.And that was what led to this product surface.swyx: And it is just simple idea of have criteria for completion and do not. Until you complete,Samantha: there's a bit more complexity as well in, in our implementation. Like there's a specific, you have to start out by aligning and there's like a planning stage where it will work with you and it will not get like start grind execution mode until it's decided that the [00:46:00] plan is amenable to both of you.Basically,swyx: I refuse to work until you make me happy.Jonas: We found that it's really important where people would give like very underspecified prompt and then expect it to come back with magic. And if it's gonna go off and work for three minutes, that's one thing. When it's gonna go off and work for three days, probably should spend like a few hours upfront making sure that you have communicated what you actually want.swyx: Yeah. And just to like really drive from the point. We really mean three days that No, noJonas: human. Oh yeah. We've had three day months innovation whatsoever.Samantha: I don't know what the record is, but there's been a long time with the grantsJonas: and so the thing that is available in cursor. The long running agent is if you wanna think about it, very abstractly that is like one worker node.Whereas what built the browser is a society of workers and planners and different agents collaborating. Because we started building the browser with one worker node at the time, that was just the agent. And it became one worker node when we realized that the throughput of the system was not where it needed to be [00:47:00] to get something as large of a scale as the browser done.swyx: Yeah.Jonas: And so this has also become a really big mental model for us with cloud, cloud agents is there's the classic engineering latency throughput trade-offs. And so you know, the code is water flowing through a pipe. The, we think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster and we'll be making the pipe much wider and so ing more, whether that's swarms of agents or parallel agents, both of those are things that contribute to getting.Much more done in the same amount of time, but any one of those tasks doesn't necessarily need to get done that quickly. And throughput is this really big thing where if you see the system of a hundred concurrent agents outputting thousands of tokens a second, you can't go back like that.Just you see a glimpse of the future where obviously there are many caveats. Like no one is using this browser. IRL. There's like a bunch of things not quite right yet, but we are going to get to systems that produce real production [00:48:00] code at the scale much sooner than people think. And it forces you to think what even happens to production systems. Like we've broken our GitHub actions recently because we have so many agents like producing and pushing code that like CICD is just overloaded. ‘cause suddenly it's like effectively weg grew, cursor's growing very quickly anyway, but you grow head count, 10 x when people run 10 x as many agents.And so a lot of these systems, exactly, a lot of these systems will need to adapt.swyx: It also reminds me, we, we all, the three of us live in the app layer, but if you talk to the researchers who are doing RL infrastructure, it's the same thing. It's like all these parallel rollouts and scheduling them and making sure as much throughput as possible goes through them.Yeah, it's the same thing.Jonas: We were talking briefly before we started recording. You were mentioning memory chips and some of the shortages there. The other thing that I think is just like hard to wrap your head around the scale of the system that was building the browser, the concurrency there.If Sam and I both have a system like that running for us, [00:49:00] shipping our software. The amount of inference that we're going to need per developer is just really mind-boggling. And that makes, sometimes when I think about that, I think that even with, the most optimistic projections for what we're going to need in terms of buildout, our underestimating, the extent to which these swarm systems can like churn at scale to produce code that is valuable to the economy.And,swyx: yeah, you can cut this if it's sensitive, but I was just Do you have estimates of how much your token consumption is?Jonas: Like per developer?swyx: Yeah. Or yourself. I don't need like comfy average. I just curious. ISamantha: feel like I, for a while I wasn't an admin on the usage dashboard, so I like wasn't able to actually see, but it was a,swyx: mine has gone up.Samantha: Oh yeah.swyx: But I thinkSamantha: it's in terms of how much work I'm doing, it's more like I have no worries about developers losing their jobs, at least in the near term. ‘cause I feel like that's a more broad discussion.swyx: Yeah. Yeah. You went there. I didn't go, I wasn't going there.I was just like how much more are you using?Samantha: There's so much stuff to be built. And so I feel like I'm basically just [00:50:00] trying to constantly I have more ambitions than I did before. Yes. Personally. Yes. So can't speak to the broader thing. But for me it's like I'm busier than ever before.I'm using more tokens and I am also doing more things.Jonas: Yeah. Yeah. I don't have the stats for myself, but I think broadly a thing that we've seen, that we expect to continue is J'S paradox. Whereswyx: you can't do it in our podcast without seeingJonas: it. Exactly. We've done it. Now we can wrap. We've done, we said the words.Phase one tab auto complete people paid like 20 bucks a month. And that was great. Phase two where you were iterating with these local models. Today people pay like hundreds of dollars a month. I think as we think about these highly parallel kind of agents running off for a long times in their own VM system, we are already at that point where people will be spending thousands of dollars a month per human, and I think potentially tens of thousands and beyond, where it's not like we are greedy for like capturing more money, but what happens is just individuals get that much more leverage.And if one person can do as much as 10 people, yeah. That tool that allows ‘em to do that is going to be tremendously valuable [00:51:00] and worth investing in and taking the best thing that exists.swyx: One more question on just the cursor in general and then open-ended for you guys to plug whatever you wanna put.How is Cursor hiring these days?Samantha: What do you mean by how?swyx: So obviously lead code is dead. Oh,Samantha: okay.swyx: Everyone says work trial. Different people have different levels of adoption of agents. Some people can really adopt can be much more productive. But other people, you just need to give them a little bit of time.And sometimes they've never lived in a token rich place like cursor.And once you live in a token rich place, you're you just work differently. But you need to have done that. And a lot of people anyway, it was just open-ended. Like how has agentic engineering, agentic coding changed your opinions on hiring?Is there any like broad like insights? Yeah.Jonas: Basically I'm asking this for other people, right? Yeah, totally. Totally. To hear Sam's opinion, we haven't talked about this the two of us. I think that we don't see necessarily being great at the latest thing with AI coding as a prerequisite.I do think that's a sign that people are keeping up and [00:52:00] curious and willing to upscale themselves in what's happening because. As we were talking about the last three months, the game has completely changed. It's like what I do all day is very different.swyx: Like it's my job and I can't,Jonas: Yeah, totally.I do think that still as Sam was saying, the fundamentals remain important in the current age and being able to go and double click down. And models today do still have weaknesses where if you let them run for too long without cleaning up and refactoring, the coke will get sloppy and there'll be bad abstractions.And so you still do need humans that like have built systems before, no good patterns when they see them and know where to steer things.Samantha: I would agree with that. I would say again, cursor also operates very quickly and leveraging ag agentic engineering is probably one reason why that's possible in this current moment.I think in the past it was just like people coding quickly and now there's like people who use agents to move faster as well. So it's part of our process will always look for we'll select for kind of that ability to make good decisions quickly and move well in this environment.And so I think being able to [00:53:00] figure out how to use agents to help you do that is an important part of it too.swyx: Yeah. Okay. The fork in the road, either predictions for the end of the year, if you have any, or PUDs.Jonas: Evictions are not going to go well.Samantha: I know it's hard.swyx: They're so hard. Get it wrong.It's okay. Just, yeah.Jonas: One other plug that may be interesting that I feel like we touched on but haven't talked a ton about is a thing that the kind of these new interfaces and this parallelism enables is the ability to hop back and forth between threads really quickly. And so a thing that we have,swyx: you wanna show something or,Jonas: yeah, I can show something.A thing that we have felt with local agents is this pain around contact switching. And you have one agent that went off and did some work and another agent that, that did something else. And so here by having, I just have three tabs open, let's say, but I can very quickly, hop in here.This is an example I showed earlier, but the actual workflow here I think is really different in a way that may not be obvious, where, I start t

Atareao con Linux
ATA 776 La guía definitiva de Logs en Podman. Automatización, Quadlets y Vector

Atareao con Linux

Play Episode Listen Later Mar 5, 2026 19:56


¿Tu servidor Linux se comporta de forma extraña y no sabes por qué? En este episodio de Atareao con Linux, Lorenzo nos sumerge en el fascinante (y a veces temido) mundo de los logs, específicamente aplicados a la migración de Docker a Podman.Descubre por qué delegar la gestión de registros en JournalD es la decisión más inteligente que puedes tomar para la estabilidad de tu infraestructura. Lorenzo comparte cómo configurar los límites de almacenamiento para evitar que un contenedor descontrolado sature tu disco duro, una lección aprendida tras años de lidiar con los archivos de texto de Docker.Puntos clave del episodio:00:00:00 - Introducción a la recta final de la serie Podman.00:03:41 - El papel de la IA y Gemini en la interpretación de errores.00:05:12 - Configuración de JournalD y límites de persistencia.00:10:46 - Scripts personalizados: checkQuadlets y JCU para mejorar la visibilidad.00:14:06 - Observabilidad avanzada con Vector y OpenObserve.00:16:10 - Cómo ser reactivo y bloquear ataques usando los registros de acceso.Además de la configuración técnica, Lorenzo nos presenta herramientas prácticas de su propio flujo de trabajo, como el uso de bat para el resaltado de sintaxis en logs y cómo automatizar reportes de salud para tus servicios. Si quieres transformar tus aburridos registros en una herramienta de seguridad activa capaz de enviar notificaciones a Telegram o bloquear IPs maliciosas directamente en el proxy, no puedes perderte este cierre de serie.Recuerda que puedes encontrar todos los scripts y notas detalladas en atareao.es. ¡Acompañanos en este viaje hacia la maestría en contenedores!Más información y enlaces en las notas del episodio

DevOps and Docker Talk
Your Images are Out of Date (probably) - The Silent Rebuilds problem

DevOps and Docker Talk

Play Episode Listen Later Mar 4, 2026 38:19


Container base images (like Official Docker Hub images) are often updated without new tag versions. I call this Silent Rebuilds. There's no way to know this happens without image digest-checking automation like Dependabot and Renovate with specific settings. Failure to keep up-to-date is a prime source of vulnerabilities that can lead to serious security breaches. Automate the updates!Check out the video podcast version here: https://youtu.be/z_ahbsSc4Fo

Of Je Stopt De Stekker Er In
#096 | Hoe MavenBlue GPU‑Schaalbaarheid inzet op IBM Cloud

Of Je Stopt De Stekker Er In

Play Episode Listen Later Mar 3, 2026 27:36


Vandaag aan tafel:Barend BaarssenKarel van der WoudeRuben van der ZwaanPaul HaggTimeline:0:00 Intro0:29 Introducties + wie is MavenBlue2:26 Wat heeft MavenBlue in de IBM Cloud gebouwd?7:00 Waarom IBM Cloud?7:54 Pizzadozen vs Serverless Fleets11:27 Serverless Fleets voordelen17:32 Wat doet MavenBlue nog meer20:45 Cloud Opslag24:10 Cloud BeveiligingShownotes:We duiken in de wereld van extreem rekenintensieve software met Ruben van der Zwaan (Application Owner van de Economische Scenario Generator) en Paul Hagg (eindverantwoordelijk voor alle software en cloud‑hosting) van MavenBlue.Barend ontmoette Paul vorig jaar op IBM TechXchange in Orlando, waar MavenBlue in een demo liet zien hoe zij hun Economische Scenario Generator hebben gebouwd op de IBM Cloud. Wat Barend toen zag? Een oplossing die slim en dynamisch omgaat met CPU's en GPU's, waardoor duizenden tot miljoenen scenario's tegelijkertijd kunnen worden doorgerekend. Geen wachttijden meer — gewoon pure, schaalbare rekenkracht.MavenBlue begon in 2017 en verhuisden in 2020 de “pizzadozen” naar de cloud, waar ze nog steeds een cruciale rol spelen — maar dan volledig virtueel. Moderne uitdagingen, zoals piekbelasting aan het begin van elke maand, worden nu opgelost met Serverless Fleets, waardoor hardware alleen wordt ingeschakeld wanneer het écht nodig is. GPUs worden automatisch opgeschaald, draaien volle bak, en verdwijnen weer zodra de klus klaar is. Efficiënt, snel en kostenbewust.Met hun Docker‑gebaseerde aanpak houdt IBM alle infrastructuur “saai”, zodat MavenBlue zich volledig kan focussen op wat ze het beste doen: hoogwaardige, parallelle berekeningen bouwen voor de verzekeringswereld. En ondertussen werken ze verder aan hun cloud‑storage‑architectuur om ook daar de volgende stap te zetten.Het thema van onze podcast – “Of Je Stopt De Stekker Er In” – past deze aflevering misschien wel beter dan ooit. Bij MavenBlue gaat de virtuele stekker er alleen in wanneer de rekencapaciteit nodig is. En zodra het werk gedaan is? Dan gaat die er net zo snel weer uit.Links:MavenBlue: https://mavenblue.comEconomic Scenario Generation: https://mavenblue.com/solutions-esg/LinkedIn Paul Hagg: https://www.linkedin.com/in/paul-hagg-7b8bbb1/LinkedIn Ruben van der Zwaan: https://www.linkedin.com/in/ruben-van-der-zwaan-19017295/Op- en aanmerkingen kunnen gestuurd worden naar: ofjestoptdestekkererin@nl.ibm.com

ShopTalk » Podcast Feed
704: Sanitizer API with Frederik Braun

ShopTalk » Podcast Feed

Play Episode Listen Later Mar 2, 2026 62:25


Show DescriptionWe talk with Frederik Braun from Mozilla about the Sanitizer API, how it works with HTML tags and web components, what it does with malformed HTML, and where CSP fits in alongside the Sanitizer API. Listen on WebsiteWatch on YouTubeGuestsFrederik BraunGuest's Main URL • Guest's SocialSecurity engineer and manager working on the Mozilla Firefox web browser Links Frederik Braun: Why the Sanitizer API is just setHTML() Frederik Braun freddyb (Frederik B) SponsorsBluehostDo you ever feel like pre-configured hosting is slowing you down? That is where VPS hosting starts to make a lot more sense. With Bluehost VPS, you are not stuck inside someone else's environment. You get full control of the server. You can spin up Docker, deploy containerized apps, run workflows, and connect your CRM, databases, and APIs without weird restrictions. No shared bottlenecks. No artificial limits. If you want to actually own your stack, your data, your performance, your roadmap, VPS is the move.

Atareao con Linux
ATA 774 Rollbacks Automáticos en Podman. Como WatchTower pero mejor.

Atareao con Linux

Play Episode Listen Later Feb 26, 2026 16:52


Bienvenidos a un nuevo episodio de Atareao con Linux. Soy Lorenzo y hoy vamos a profundizar en un aspecto que hace que la migración de Docker a Podman sea no solo recomendable, sino necesaria para quienes buscamos estabilidad: la gestión nativa de actualizaciones y la seguridad de los rollbacks.En el ecosistema de contenedores, la actualización es vital para la seguridad y el rendimiento, pero siempre conlleva el riesgo de romper el servicio. Muchos hemos confiado en Watchtower, pero hoy descubriremos por qué Podman juega en otra liga. Al estar integrado directamente con SystemD, Podman nos permite automatizar todo el proceso sin dependencias externas.¿Qué aprenderás en este episodio?Podman Auto-Update: Cómo configurar tus contenedores para que se mantengan al día usando etiquetas de registro y locales.Quadlets y SystemD: La forma profesional de gestionar infraestructura como código en tu propia máquina Linux.Timers inteligentes: Cómo programar actualizaciones para que no ocurran todas a la vez y cómo verificar estas programaciones con herramientas nativas.Rollbacks Reactivos: La capacidad de Podman para detectar un fallo en una nueva versión y volver automáticamente a la imagen anterior, garantizando la disponibilidad del servicio.Notificaciones de estado: Cómo integrar avisos en Telegram o Matrix para estar siempre informado de lo que ocurre en tu servidor.Hablaremos sobre la importancia de los Health Checks y cómo estos actúan como el disparador perfecto para que el sistema decida si una actualización ha sido exitosa o si debe retroceder. Es la solución definitiva a esos problemas de compatibilidad que a veces surgen cuando una imagen nueva cambia sus requisitos sin previo aviso.Si te apasiona el auto-hosting, la administración de sistemas o simplemente quieres optimizar tu flujo de trabajo con contenedores, este episodio te dará las claves técnicas para Capítulos:00:00:00 Introducción al episodio 774: Migración a Podman00:00:32 El problema de Watchtower en Docker00:01:42 Podman: Actualizaciones automáticas de caja00:02:10 La magia de la integración con SystemD00:03:13 El comando Podman Auto-Update y etiquetas00:03:57 Registry vs Local: Opciones de actualización00:04:22 Cómo etiquetar contenedores y Quadlets00:05:30 Configuración de timers con SystemD00:06:33 Opciones avanzadas de programación y aleatoriedad00:07:40 Verificación de timers con SystemD-Analyze00:08:29 Ventajas sobre servicios externos00:09:04 Rollbacks Reactivos: La gran ventaja de Podman00:09:44 Requisitos técnicos para el rollback automático00:10:33 Health Checks: Garantizando la salud del servicio00:11:44 Solución a problemas de compatibilidad en imágenes00:12:51 Notificaciones automáticas en Telegram y Matrix00:14:11 Conclusiones y superioridad frente a Watchtower00:14:40 Recursos en las notas y etiquetas detalladas00:15:34 Próximos pasos de la migración: Traefik y Logs00:16:32 Despedida y red de podcastsRecuerda que tienes todos los detalles técnicos y las etiquetas mencionadas en las notas del episodio en atareao.es. ¡Disfruta del episodio!Más información y enlaces en las notas del episodio

Ardan Labs Podcast
APIs, Wundergraph, and Resilience with Jens Neuse

Ardan Labs Podcast

Play Episode Listen Later Feb 25, 2026 76:33


In this episode of the Ardan Labs Podcast, Ale Kennedy talks with Jens Neuse, CEO and co-founder of WunderGraph, about his unconventional path into technology and entrepreneurship. After a life-altering accident ended his carpentry career, Jens taught himself to code during recovery and eventually built WunderGraph to solve modern API challenges.Jens shares the evolution of WunderGraph from an early-stage startup to a successful open-source platform, including pivotal moments like securing eBay as a customer. The conversation highlights the importance of resilience, community-driven development, and balancing startup life with family, offering insight into what it takes to build meaningful technology through adversity and persistence.00:00 Introduction and Current Life07:19 Dropping Out and Carpentry Career10:52 Life-Altering Accident and Recovery18:01 Learning to Walk and Finding Direction27:46 Discovering Coding and Technology31:17 Starting the Startup Journey33:07 Discovering the Power of APIs40:50 Building a Team and Leadership Growth48:17 Founding WunderGraph59:07 Pivoting to Open Source01:05:32 eBay Breakthrough and Validation01:10:08 Balancing Family and Startup LifeConnect with Jens: LinkedIn: https://www.linkedin.com/in/jens-neuseMentioned in this Episode:Wundergraph: https://wundergraph.comWant more from Ardan Labs? You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!Online Courses : https://ardanlabs.com/education/ Live Events : https://www.ardanlabs.com/live-training-events/ Blog : https://www.ardanlabs.com/blog Github : https://github.com/ardanlabs

Crazy Wisdom
Episode #534: From COVID's Trust Bonfire to Decentralized Everything

Crazy Wisdom

Play Episode Listen Later Feb 23, 2026 54:53


In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with Jake Hamilton, founder of Groundwire and Nockbox, to explore zero-knowledge proofs, Bitcoin identity systems, and the intersection of privacy-preserving cryptography with AI and blockchain technology. They discuss how ZK proofs could offer an alternative to invasive identity verification systems being rolled out by governments worldwide, the potential for continual learning AI models to shift the balance between centralized and open-source development, and why building secure, auditable computing infrastructure on platforms like Urbit matters more than ever as we face an explosion of AI agents and automated systems. Jake also explains Nockchain's approach to creating a global repository of cryptographically verified facts that can power trustless programmable systems, and how these technologies might converge to solve problems around supply chain security, personal data sovereignty, and resistance to censorship.Timestamps00:00 Introduction to Groundwire and Knockbox02:48 Understanding Zero-Knowledge Proofs06:04 Government Adoption of ZK Proofs08:55 The Future of Identity Verification11:52 AI and ZK Proofs: A New Era14:54 The Role of Urbit in Technology18:03 The Impact of COVID on Trust20:51 The Evolution of AI and Data Privacy23:47 The Future of AI Models26:54 The Need for Local AI Solutions29:51 Interoperability of Knockchain and BitcoinKey Insights1. Zero-Knowledge Proofs Enable Privacy-Preserving Verification: Jake explains that ZK proofs allow you to prove computational outcomes without revealing the underlying data. For example, you could prove you're over 18 without exposing your full identity or driver's license information. The proof demonstrates that a specific program ran through certain steps and reached a particular conclusion, and validating this proof is fast and compact. This technology has profound implications for age verification, identity systems, and protecting privacy while maintaining necessary compliance, potentially offering a middle path between surveillance states and complete anonymity.2. Government Adoption of Privacy Technology Remains Uncertain: There are three competing motivations driving government identity verification systems: genuine surveillance desires, bureaucratic efficiency seeking, and legitimate child protection concerns. Jake believes these groups can be separated, with some officials potentially supporting ZK-based solutions if positioned correctly. He notes the EU is exploring ZK identity verification, and UK officials have shown interest. The key is framing privacy-preserving technology as protection against "the swamp" rather than just abstract privacy benefits, which could resonate with certain political constituencies.3. The COVID Era Destroyed Institutional Trust at Unprecedented Scale: The conversation identifies COVID as potentially the largest institutional trust-burning event in human history, with numerous institutions simultaneously losing credibility with large portions of the population. This represents a dramatic shift from the boomer generation's default trust in authority figures and mainstream media. This collapse is compounded by the incoming AI revolution, creating a perfect storm where established bureaucracies cannot adapt quickly enough to manage rapidly evolving technology, leaving society in fundamentally unmanageable territory.4. Centralized AI Models Create Dangerous Dependencies: Both speakers acknowledge growing dependence on centralized AI services like Claude, with some users spending thousands monthly on tokens. This dependency creates vulnerability to price increases and service disruptions. Jake advocates for local AI deployment using models like DeepSeek R1, running on personal hardware to maintain control and privacy. The shift toward continuous learning models will fundamentally change the AI landscape, making personal data harvesting even more valuable and raising urgent questions about compensation and consent for training data contribution.5. High-Quality Training Data Is Becoming the Primary AI Bottleneck: Stewart argues that AI development is now limited more by high-quality training data than by compute power. The industry has exhausted easily accessible internet data and body-shop-style data labeling. Companies are now using specialized boutique services with techniques like head-mounted cameras for live-streaming world model training. This scarcity is subtly driving price increases across AI services and will fundamentally reshape the economics of AI development, with implications for who controls these increasingly powerful systems.6. Urbit Offers a Foundation for Trustworthy Computing: Jake positions Urbit as essential infrastructure for the AI age because its 30,000-line codebase (versus Unix's three million lines) can be understood by individual humans. Its deterministic, purely functional, and strictly typed design aims for eventual ossification—software that doesn't require constant security patches. This "tiny and diamond perfect" approach addresses the fundamental insecurity of systems requiring monthly vulnerability patches. In an era of AI agents and potential prompt injection attacks, having verifiable, comprehensible computing infrastructure becomes existentially important rather than merely desirable.7. Nockchain Creates a Global Repository of Provable Truth: Jake's vision for Nockchain combines ZK proofs with blockchain technology to create a globally available "truth repository" where verified facts can be programmatically accessed together. This enables smart contracts or programs gated on combinations of proven facts—such as temperature readings from secure devices, supply chain events, and payment confirmations. By using Nock's abstract, simple design optimized for ZK proof generation, the system can validate complex real-world conditions without exposing underlying data, creating infrastructure for coordinating action based on verifiable private information at global scale.

Infinitum
Svi ćemo biti graditelji

Infinitum

Play Episode Listen Later Feb 21, 2026 83:25


Ep 278 Pages, Keynote, and Numbers 15 Go Freemium Kuzu database company joins Apple's list of recent acquisitions iOS 26.3 Features: Everything New in iOS 26.3 Tauri 2.0 — The cross-platform app building toolkit Rork — Create mobile app in minutes, using AI OpenClaw, OpenAI and the future | Peter Steinberger jordy: I wasted 80 hours and $800 setting up OpenClaw - so you don't have to. I used Xcode 26.3 to build an iOS app with my voice in just two days - and it was exhilarating Steve Troughton-Smith: In case you missed it, I've been testing the limits of Xcode 26.3's agentic programming support this week, using Codex. This entire app used 7% of my weekly Codex usage limit. Compare that to a single (awful) slideshow in Keynote using 47% of my monthly Apple Creator Studio usage limit. Aditya: Cons of being a software engineer no one really talks about… HackerTyper: Use This Site To Prank Your Friends With Your Coding Skills :) Virtualization Explained: We Install 1TB of RAM for HyperVisors, Virtual Machines, and Docker! Mr. Macintosh: The very first email from space was sent on a Macintosh Portable by James Adamson & Shannon Lucid aboard the Shuttle Atlantis STS-43 Public Domain Remastered — Looney Tunes MEGA Compilation, 118 FULL Episodes in 4K 60FPS Zahvalnice Snimano 20.2.2026. Uvodna muzika by Vladimir Tošić, stari sajt je ovde. Logotip by Aleksandra Ilić. Artwork epizode by Saša Montiljo, njegov kutak na Devianartu

Foundations of Amateur Radio
How to go about documenting your setup?

Foundations of Amateur Radio

Play Episode Listen Later Feb 21, 2026 5:22


Foundations of Amateur Radio How to go about documenting your setup? Possibly the single most important thing that separates science from "fiddling around" is documentation. Figuring out how to document things is often non-trivial and me telling you that "unless you wrote it down, it didn't happen" only goes so far. If documentation isn't your thing, what about "I broke something and I don't know how it was before I fiddled" as an incentive instead? Recently I had cause to explore how to document how my station is configured. To give you a sense, the microphone is connected to a remote-rig, which is connected to a Wi-Fi base station, over Wi-Fi to a Wi-Fi slave, to another remote-rig, to the radio body, to the VHF port, through two coax switches, a run of RG213, to an antenna. When receiving, it goes from the antenna, to a run of RG213, through two coax switches, to the VHF port, to the radio body, to a remote-rig, to a Wi-Fi slave, to a Wi-Fi base station to a remote-rig, to the remote head, to a set of headphones. Of course, at this point I've written it down, so, job done .. right? Well, what about the data connection, the external speaker, the remote head display and other goodies, say nothing of the duplicate devices with similar names. All in all, the FT-857d has something like eleven ports, each remote-rig has ten, so just wording it is a start, but hardly qualifies as documented. What if we drew a picture instead? At this point you could pull out your crayons and start scribbling on a sheet of butcher's paper and that would be a fine start, but it would be difficult to share with me or anyone else and updating it would be a challenge, let alone versioning it. As it happens, we're not the first people to have this issue. In the 1980's and 1990's researchers at Bell Labs were trying to figure out how to draw graphs and from that work a language, 'DOT', since everyone is a fan of the "DAG Of Tomorrow", and a series of tools, which today are known as 'Graphviz', made the visualisation of relationships possible without the application of coloured wax on dried cellulose fibre. In my other, computing job, I had cause to visualise the relationship between a million or so nodes, allowing me to discover a specific node that was directing all traffic, where I could insert my debugging code, but it was only possible thanks to these free and open source tools. While the DOT language isn't particularly complex, it occurred to me that for someone not conversant with the syntax, we can start even simpler with a CSV text file that shows the relationships between each device and convert the CSV to DOT and in turn to a picture. For example, I documented the relationship between the radio and the antenna by adding five lines to a CSV file, essentially, FT857d to VHF port to VHF coax switch to VHF grounding switch to RG213 to antenna. In all, to document everything except power, since I haven't decided how I want to describe it, I used a CSV with 47 lines. On the face of it, that might sound ridiculous, but I can tell you, it shows all the sockets on the FT857d, all the sockets on both remote-rig devices and the relationships between them. With it anyone can duplicate my set-up. Having previously spent some quality time learning various aspects of the DOT language, I figured I could write a little script to convert CSV files to DOT, but being of the generation of software developers with the attitude, "Why write something if someone else already did?", I discovered that Reinier Post at the Eindhoven University of Technology has a delightful collection of scripts, including one appropriately named 'csv2dot'. Written in Perl, the only language that according to some looks just as impenetrable before and after encryption, the tool works as advertised and makes a DOT file that you can then visualise using Graphviz. Of course there's Python scripts lying around that claim to do the same, but I wasn't keen to install the kitchen sink just to try them. Instead I made a quick little Docker file that you can find on my vk6flab GitHub repository that will walk you through this, complete with my example, so you have a starting point. Now, I used this here to describe my station, well, one part of it, but it can easily extend to document your entire station, and because we're talking about text files that contain the information, anyone with a copy of a text editor can update the file when things change, since that's where the real magic happens. So, what are you waiting for, documentation? I'm Onno VK6FLAB

Atareao con Linux
ATA 772 Evita Contenedores ZOMBIE. Guía Maestra de Health Checks en Podman

Atareao con Linux

Play Episode Listen Later Feb 19, 2026 20:08


¿Tu contenedor está realmente funcionando o es solo un proceso zombie ocupando memoria? En el episodio 772 de Atareao con Linux, te revelo los secretos para gestionar la salud de tus contenedores como un experto.Soy Lorenzo y en esta entrega nos enfocamos en Podman y los Health Checks. Si en el episodio 688 hablamos de Docker, hoy damos el salto definitivo hacia la automatización profesional en Linux utilizando Quadlets y Systemd.Lo que vas a descubrir en este audio: Detección de Zombies: Aprende a identificar procesos que parecen activos pero no responden. Dependencias Reales: Cómo configurar tu stack de WordPress, MariaDB y Redis para que arranquen en el orden correcto y solo cuando sus predecesores estén sanos. Auto-reanimación: Configura políticas de reinicio que actúan automáticamente ante fallos de salud. Notificaciones Inteligentes: Recibe alertas en Telegram o en tu escritorio cuando tus servicios cambien de estado.Este episodio es una guía práctica para cualquier persona que quiera robustecer su infraestructura de contenedores, evitando los cierres inesperados y las dependencias rotas que suelen ocurrir con herramientas tradicionales como Docker Compose.Capítulos: 00:00:00 ¿Tu contenedor está vivo o es un ZOMBIE? 00:01:44 ¿Qué es realmente un Health Check? 00:02:22 4 Ventajas de usar Health Checks 00:03:20 Implementación en Podman y Docker 00:05:20 La potencia de los Quadlets 00:08:58 Dependencias inteligentes: WordPress+MariaDB+Redis 00:11:00 Notificaciones On Success 00:13:55 Gestión de errores On Failure 00:18:21 Próximos pasos y TraefikSi disfrutas del podcast, te agradecería enormemente una valoración en Spotify o Apple Podcast. ¡Ayúdame a difundir la palabra del Open Source!Más información y enlaces en las notas del episodio

DevOps and Docker Talk
AI Wins and Misses for 2025

DevOps and Docker Talk

Play Episode Listen Later Feb 17, 2026 76:34


I'm joined by Nirmal Mehta of AWS and Viktor Farcic from Upbound, to go through our 2025 year in review. We look into the AI tools that consumed us this year, from CLI agents to terminal emulators, IDEs, AI browsers - what worked, what flopped, what's worth your time and money, and what we think isn't!Check out the video podcast version here: https://youtu.be/mnagfUsh5bc

Side Project Spotlight
#105: How Alex Hillman Built an AI Assistant with Claude Code

Side Project Spotlight

Play Episode Listen Later Feb 16, 2026 82:04


In this episode, Alex Hillman, co-founder of Philadelphia's legendary coworking space Indy Hall, takes us through his journey building a sophisticated AI executive assistant using Claude Code. What started as a simple terminal experiment in October 2025 has evolved into a full production system that autonomously manages network diagnostics, email workflows, relationship tracking, and newsletter automation. Alex shares the technical architecture, real-world stories of AI-powered problem solving, cost insights, and his thoughtful approach to building trust with AI while maintaining strong ethical guardrails.## Chapters- 00:00 Coming Up...- 02:01 Introductions- 03:57 The Origins of PhillyCocoa and Indie Hall- 06:12 The Evolution of AI and Personal Assistants- 07:35 Building a Personal Assistant with Claude Code- 10:26 The Architecture of the Personal Assistant- 14:04 Creating a Web App Interface for the Assistant- 16:10 Using Tailscale for Secure Access- 19:01 Mitigating Risks with AI Autonomy- 29:24 Backup Protocols and Data Management- 31:23 Emergent Behavior in AI Systems- 34:10 Flow State and Productivity in Programming- 37:56 Understanding AI Behavior and User Education- 39:45 Cost Management in AI Development- 45:37 Building Trust with AI Systems- 53:53 Navigating Trust in Skill Utilization- 55:23 Technical Applications for Non-Developers- 01:00:17 Innovative Personal and Business Management- 01:09:03 Transforming Workflows with AI- 01:12:56 Ethics and Responsibility in AI Usage- 01:18:25 Community Building Through Meetups- 01:21:55 Tag## Highlights**Architecture:** Claude Code headless via CLI with WebSocket communication, Docker on Hetzner VPS, Tailscale networking, hourly snapshots, git hooks for destructive commands, multi-layered security.**Real Use Cases:**- Network monitoring that diagnosed an overheating router fan from a screenshot- Email sorted by "easiest to hardest" instead of chronological- Date night tracking with restaurant and wine pairing suggestions- Organized 51 wine bottles via photos into ASCII grid layout- Newsletter reduced from 4 hours to 30 minutes while preserving human writing**Costs:** $20/month plan lasted 20 minutes. Now at $200/month. One Thanksgiving week hit $1,500 in overages during heavy development.**Philosophy:** "Modest YOLO" approach—autonomous but controlled. AI enhances human work, doesn't replace it. The system can modify itself: type "add a button," refresh, it works.**Open Source:**- **Kuato**: Session search for Claude Code- **Smaug**: Twitter bookmark archiver with AI analysis- **Andy Timeline**: Auto-generated weekly narrative of the AI's evolution## Event**Big Philly Meetup Mashup** - March 15, 2026Hackathon for Philadelphia's tech and creative communities. Theme: "Good Neighbors." Sponsored by Supabase.https://indyhall.org/goodneighbors/## Links**Alex Hillman**YouTube: https://www.youtube.com/@AlexHillman | Website: https://dangerouslyawesome.com | GitHub: https://github.com/alexknowshtml**Open Source Projects**Kuato: https://github.com/alexknowshtml/kuato | Smaug: https://github.com/alexknowshtml/smaug | Andy Timeline: https://github.com/alexknowshtml/andy-timeline**Tools & Resources**Indy Hall: https://indyhall.org | Claude Code: https://claude.com/product/claude-code | OpenClaw: https://openclaw.ai | Brian Casel: https://www.youtube.com/@briancasel | Termius: https://termius.com | Point-Free: https://www.pointfree.co/the-way**PhillyCocoa:** http://phillycocoa.orgIntro music: "When I Hit the Floor", © 2021 Lorne Behrman. Used with permission of the artist.

Atareao con Linux
ATA 771 Adiós a las excusas. Cómo monté mi VS Code en un servidor

Atareao con Linux

Play Episode Listen Later Feb 16, 2026 20:51


¿Te has rendido alguna vez intentando programar en movilidad? Te confieso que lo de programar en la tablet Android no me estaba funcionando, y la razón era sencilla: pereza y falta de un entorno coherente. En el episodio de hoy, te cuento cómo he solucionado este problema de raíz instalando Code Server en un servidor remoto.A lo largo de este audio, exploramos los desafíos de mantener múltiples entornos de desarrollo y por qué la fragmentación mata tu creatividad. Te detallo el paso a paso de mi configuración técnica: desde la creación de una imagen de Docker personalizada hasta la integración de herramientas modernas escritas en Rust (como Bat y LSD) que mejoran la experiencia en la terminal.Lo que aprenderás en este episodio: Por qué un servidor de desarrollo es superior a las instalaciones locales en tablets. Cómo configurar Docker Compose para desplegar Code Server con persistencia real. Seguridad avanzada: Uso de Traefik, Pocket ID y geobloqueo para proteger tu código. Trucos de configuración para VS Code en el navegador: Mapeo de teclas, evitar el conflicto con la tecla Escape y el uso de la fuente JetBrains Mono. Productividad máxima con los modos de Vim integrados en el flujo web. Cómo transformar Code Server en una PWA para eliminar las distracciones del navegador en Android.No se trata solo de tecnología, sino de eliminar las fricciones que nos impiden avanzar en nuestros proyectos. Si quieres saber cómo convertir cualquier dispositivo con un navegador en tu estación de trabajo principal, no te pierdas este episodio.Cronología del episodio:00:00:00 El fracaso de programar en tablet (y por qué)00:01:43 La solución definitiva: Code Server00:02:12 El problema de los entornos fragmentados00:03:53 Mi imagen personalizada de Docker para Code Server00:05:04 Herramientas imprescindibles en Rust (Bat, LSD, SD)00:06:23 Configuración de Rust y herramientas de desarrollo00:07:05 Persistencia y Docker Compose00:08:06 Seguridad: Traefik, Pocket ID y Geobloqueo00:10:03 Optimizando VS Code para el navegador00:11:13 Sincronización y persistencia de extensiones00:12:43 Estética y tipografía (Ayu Dark y JetBrains Mono)00:13:59 El poder de Vim dentro de Code Server00:15:51 Cómo usar Code Server como una PWA en Android00:17:04 Teclado físico: El accesorio obligatorio00:18:50 Conclusiones y futuro del desarrollo remotoRecuerda que puedes encontrar todas las notas, el repositorio y los enlaces mencionados en atareao.es. Si te gusta el contenido, una valoración en Spotify o Apple Podcast ayuda muchísimo a seguir difundiendo el mundo Linux y el Open Source.Más información y enlaces en las notas del episodio

Les Cast Codeurs Podcast
LCC 337 - Datacenters Carrier Class dans l'espace

Les Cast Codeurs Podcast

Play Episode Listen Later Feb 16, 2026 94:19


Emmanuel et Guillaume discutent de divers sujets liés à la programmation, notamment les systèmes de fichiers en Java, le Data Oriented Programming, les défis de JPA avec Kotlin, et les nouvelles fonctionnalités de Quarkus. Ils explorent également des sujets un peu fous comme la création de datacenters dans l'espace. Pas mal d'architecture aussi. Enregistré le 13 février 2026 Téléchargement de l'épisode LesCastCodeurs-Episode-337.mp3 ou en vidéo sur YouTube. News Langages Comment implémenter un file system en Java https://foojay.io/today/bootstrapping-a-java-file-system/ Créer un système de fichiers Java personnalisé avec NIO.2 pour des usages variés (VCS, archives, systèmes distants). Évolution Java: java.io.File (1.0) -> NIO (1.4) -> NIO.2 (1.7) pour personnalisation via FileSystem. Recommander conception préalable; API Java est orientée POSIX. Composants clés à considérer: Conception URI (scheme unique, chemin). Gestion de l'arborescence (BD, métadonnées, efficacité). Stockage binaire (emplacement, chiffrement, versions). Minimum pour démarrer (4 composants): Implémenter Path (représente fichier/répertoire). Étendre FileSystem (instance du système). Étendre FileSystemProvider (moteur, enregistré par scheme). Enregistrer FileSystemProvider via META-INF/services. Étapes suivantes: Couche BD (arborescence), opérations répertoire/fichier de base, stockage, tests. Processus long et exigeant, mais gratifiant.   Un article de brian goetz sur le futur du data oriented programming en Java https://openjdk.org/projects/amber/design-notes/beyond-records Le projet Amber de Java introduit les "carrier classes", une évolution des records qui permet plus de flexibilité tout en gardant les avantages du pattern matching et de la reconstruction Les records imposent des contraintes strictes (immutabilité, représentation exacte de l'état) qui limitent leur usage pour des classes avec état muable ou dérivé Les carrier classes permettent de déclarer une state description complète et canonique sans imposer que la représentation interne corresponde exactement à l'API publique Le modificateur "component" sur les champs permet au compilateur de dériver automatiquement les accesseurs pour les composants alignés avec la state description Les compact constructors sont généralisés aux carrier classes, générant automatiquement l'initialisation des component fields Les carrier classes supportent la déconstruction via pattern matching comme les records, rendant possible leur usage dans les instanceof et switch Les carrier interfaces permettent de définir une state description sur une interface, obligeant les implémentations à fournir les accesseurs correspondants L'extension entre carrier classes est possible, avec dérivation automatique des appels super() quand les composants parent sont subsumés par l'enfant Les records deviennent un cas particulier de carrier classes avec des contraintes supplémentaires (final, extends Record, component fields privés et finaux obligatoires) L'évolution compatible des records est améliorée en permettant l'ajout de composants en fin de liste et la déconstruction partielle par préfixe Comment éviter les pièges courants avec JPA et Kotlin - https://blog.jetbrains.com/idea/2026/01/how-to-avoid-common-pitfalls-with-jpa-and-kotlin/ JPA est une spécification Java pour la persistance objet-relationnel, mais son utilisation avec Kotlin présente des incompatibilités dues aux différences de conception des deux langages Les classes Kotlin sont finales par défaut, ce qui empêche la création de proxies par JPA pour le lazy loading et les opérations transactionnelles Le plugin kotlin-jpa génère automatiquement des constructeurs sans argument et rend les classes open, résolvant les problèmes de compatibilité Les data classes Kotlin ne sont pas adaptées aux entités JPA car elles génèrent equals/hashCode basés sur tous les champs, causant des problèmes avec les relations lazy L'utilisation de lateinit var pour les relations peut provoquer des exceptions si on accède aux propriétés avant leur initialisation par JPA Les types non-nullables Kotlin peuvent entrer en conflit avec le comportement de JPA qui initialise les entités avec des valeurs null temporaires Le backing field direct dans les getters/setters personnalisés peut contourner la logique de JPA et casser le lazy loading IntelliJ IDEA 2024.3 introduit des inspections pour détecter automatiquement ces problèmes et propose des quick-fixes L'IDE détecte les entités finales, les data classes inappropriées, les problèmes de constructeurs et l'usage incorrect de lateinit Ces nouvelles fonctionnalités aident les développeurs à éviter les bugs subtils liés à l'utilisation de JPA avec Kotlin Librairies Guide sur MapStruct @IterableMapping - https://www.baeldung.com/java-mapstruct-iterablemapping MapStruct est une bibliothèque Java pour générer automatiquement des mappers entre beans, l'annotation @IterableMapping permet de configurer finement le mapping de collections L'attribut dateFormat permet de formater automatiquement des dates lors du mapping de listes sans écrire de boucle manuelle L'attribut qualifiedByName permet de spécifier quelle méthode custom appliquer sur chaque élément de la collection à mapper Exemple d'usage : filtrer des données sensibles comme des mots de passe en mappant uniquement certains champs via une méthode dédiée L'attribut nullValueMappingStrategy permet de contrôler le comportement quand la collection source est null (retourner null ou une collection vide) L'annotation fonctionne pour tous types de collections Java (List, Set, etc.) et génère le code de boucle nécessaire Possibilité d'appliquer des formats numériques avec numberFormat pour convertir des nombres en chaînes avec un format spécifique MapStruct génère l'implémentation complète du mapper au moment de la compilation, éliminant le code boilerplate L'annotation peut être combinée avec @Named pour créer des méthodes de mapping réutilisables et nommées Le mapping des collections supporte les conversions de types complexes au-delà des simples conversions de types primitifs Accès aux fichiers Samba depuis Java avec JCIFS - https://www.baeldung.com/java-samba-jcifs JCIFS est une bibliothèque Java permettant d'accéder aux partages Samba/SMB sans monter de lecteur réseau, supportant le protocole SMB3 on pense aux galériens qui doivent se connecter aux systèmes dit legacy La configuration nécessite un contexte CIFS (CIFSContext) et des objets SmbFile pour représenter les ressources distantes L'authentification se fait via NtlmPasswordAuthenticator avec domaine, nom d'utilisateur et mot de passe La bibliothèque permet de lister les fichiers et dossiers avec listFiles() et vérifier leurs propriétés (taille, date de modification) Création de fichiers avec createNewFile() et de dossiers avec mkdir() ou mkdirs() pour créer toute une arborescence Suppression via delete() qui peut parcourir et supprimer récursivement des arborescences entières Copie de fichiers entre partages Samba avec copyTo(), mais impossibilité de copier depuis le système de fichiers local Pour copier depuis le système local, utilisation des streams SmbFileInputStream et SmbFileOutputStream Les opérations peuvent cibler différents serveurs Samba et différents partages (anonymes ou protégés par mot de passe) La bibliothèque s'intègre dans des blocs try-with-resources pour une gestion automatique des ressources Quarkus 3.31 - Support complet Java 25, nouveau packaging Maven et Panache Next - https://quarkus.io/blog/quarkus-3-31-released/ Support complet de Java 25 avec images runtime et native Nouveau packaging Maven de type quarkus avec lifecycle optimisé pour des builds plus rapides voici un article complet pour plus de detail https://quarkus.io/blog/building-large-applications/ Introduction de Panache Next, nouvelle génération avec meilleure expérience développeur et API unifiée ORM/Reactive Mise à jour vers Hibernate ORM 7.2, Reactive 3.2, Search 8.2 Support de Hibernate Spatial pour les données géospatiales Passage à Testcontainers 2 et JUnit 6 Annotations de sécurité supportées sur les repositories Jakarta Data Chiffrement des tokens OIDC pour les implémentations custom TokenStateManager Support OAuth 2.0 Pushed Authorization Requests dans l'extension OIDC Maven 3.9 maintenant requis minimum pour les projets Quarkus A2A Java SDK 1.0.0.Alpha1 - Alignement avec la spécification 1.0 du protocole Agent2Agent - https://quarkus.io/blog/a2a-java-sdk-1-0-0-alpha1/ Le SDK Java A2A implémente le protocole Agent2Agent qui permet la communication standardisée entre agents IA pour découvrir des capacités, déléguer des tâches et collaborer Passage à la version 1.0 de la spécification marque la transition d'expérimental à production-ready avec des changements cassants assumés Modernisation complète du module spec avec des Java records partout remplaçant le mix précédent de classes et records pour plus de cohérence Adoption de Protocol Buffers comme source de vérité avec des mappers MapStruct pour la conversion et Gson pour JSON-RPC Les builders utilisent maintenant des méthodes factory statiques au lieu de constructeurs publics suivant les best practices Java modernes Introduction de trois BOMs Maven pour simplifier la gestion des dépendances du SDK core, des extensions et des implémentations de référence Quarkus AgentCard évolue avec une liste supportedInterfaces remplaçant url et preferredTransport pour plus de flexibilité dans la déclaration des protocoles Support de la pagination ajouté pour ListTasks et les endpoints de configuration des notifications push avec des wrappers Result appropriés Interface A2AHttpClient pluggable permettant des implémentations HTTP personnalisées avec une implémentation Vert.x fournie Travail continu vers la conformité complète avec le TCK 1.0 en cours de développement parallèlement à la finalisation de la spécification Pourquoi Quarkus finit par "cliquer" : les 10 questions que se posent les développeurs Java - https://www.the-main-thread.com/p/quarkus-java-developers-top-questions-2025 un article qui revele et repond aux questions des gens qui ont utilisé Quarkus depuis 4-6 mois, les non noob questions Quarkus est un framework Java moderne optimisé pour le cloud qui propose des temps de démarrage ultra-rapides et une empreinte mémoire réduite Pourquoi Quarkus démarre si vite ? Le framework effectue le travail lourd au moment du build (scanning, indexation, génération de bytecode) plutôt qu'au runtime Quand utiliser le mode réactif plutôt qu'impératif ? Le réactif est pertinent pour les workloads avec haute concurrence et dominance I/O, l'impératif reste plus simple dans les autres cas Quelle est la différence entre Dev Services et Testcontainers ? Dev Services utilise Testcontainers en gérant automatiquement le cycle de vie, les ports et la configuration sans cérémonie Comment la DI de Quarkus diffère de Spring ? CDI est un standard basé sur la sécurité des types et la découverte au build-time, différent de l'approche framework de Spring Comment gérer la configuration entre environnements ? Quarkus permet de scaler depuis le développement local jusqu'à Kubernetes avec des profils, fichiers multiples et configuration externe Comment tester correctement les applications Quarkus ? @QuarkusTest démarre l'application une fois pour toute la suite de tests, changeant le modèle mental par rapport à Spring Boot Que fait vraiment Panache en coulisses ? Panache est du JPA avec des opinions fortes et des défauts propres, enveloppant Hibernate avec un style Active Record Doit-on utiliser les images natives et quand ? Les images natives brillent pour le serverless et l'edge grâce au démarrage rapide et la faible empreinte mémoire, mais tous les apps n'en bénéficient pas Comment Quarkus s'intègre avec Kubernetes ? Le framework génère automatiquement les ressources Kubernetes, gère les health checks et métriques comme s'il était nativement conçu pour cet écosystème Comment intégrer l'IA dans une application Quarkus ? LangChain4j permet d'ajouter embeddings, retrieval, guardrails et observabilité directement en Java sans passer par Python Infrastructure Les alternatives à MinIO https://rmoff.net/2026/01/14/alternatives-to-minio-for-single-node-local-s3/ MinIO a abandonné le support single-node fin 2025 pour des raisons commerciales, cassant de nombreuses démos et pipelines CI/CD qui l'utilisaient pour émuler S3 localement L'auteur cherche un remplacement simple avec image Docker, compatibilité S3, licence open source, déploiement mono-nœud facile et communauté active S3Proxy est très léger et facile à configurer, semble être l'option la plus simple mais repose sur un seul contributeur RustFS est facile à utiliser et inclut une GUI, mais c'est un projet très récent en version alpha avec une faille de sécurité majeure récente SeaweedFS existe depuis 2012 avec support S3 depuis 2018, relativement facile à configurer et dispose d'une interface web basique Zenko CloudServer remplace facilement MinIO mais la documentation et le branding (cloudserver/zenko/scality) peuvent prêter à confusion Garage nécessite une configuration complexe avec fichier TOML et conteneur d'initialisation séparé, pas un simple remplacement drop-in Apache Ozone requiert au minimum quatre nœuds pour fonctionner, beaucoup trop lourd pour un usage local simple L'auteur recommande SeaweedFS et S3Proxy comme remplaçants viables, RustFS en maybe, et élimine Garage et Ozone pour leur complexité Garage a une histoire tres associative, il vient du collectif https://deuxfleurs.fr/ qui offre un cloud distribué sans datacenter C'est certainement pas une bonne idée, les datacenters dans l'espace https://taranis.ie/datacenters-in-space-are-a-terrible-horrible-no-good-idea/ Avis d'expert (ex-NASA/Google, Dr en électronique spatiale) : Centres de données spatiaux, une "terrible" idée. Incompatibilité fondamentale : L'électronique (surtout IA/GPU) est inadaptée à l'environnement spatial. Énergie : Accès limité. Le solaire (type ISS) est insuffisant pour l'échelle de l'IA. Le nucléaire (RTG) est trop faible. Refroidissement : L'espace n'est pas "froid" ; absence de convection. Nécessite des radiateurs gigantesques (ex: 531m² pour 200kW). Radiations : Provoque erreurs (SEU, SEL) et dommages. Les GPU sont très vulnérables. Blindage lourd et inefficace. Les puces "durcies" sont très lentes. Communications : Bande passante très limitée (1Gbps radio vs 100Gbps terrestre). Le laser est tributaire des conditions atmosphériques. Conclusion : Projet extrêmement difficile, coûteux et aux performances médiocres. Data et Intelligence Artificielle Guillaume a développé un serveur MCP pour arXiv (le site de publication de papiers de recherche) en Java avec le framework Quarkus https://glaforge.dev/posts/2026/01/18/implementing-an-arxiv-mcp-server-with-quarkus-in-java/ Implémentation d'un serveur MCP (Model Context Protocol) arXiv en Java avec Quarkus. Objectif : Accéder aux publications arXiv et illustrer les fonctionnalités moins connues du protocole MCP. Mise en œuvre : Utilisation du framework Quarkus (Java) et son support MCP étendu. Assistance par Antigravity (IDE agentique) pour le développement et l'intégration de l'API arXiv. Interaction avec l'API arXiv : requêtes HTTP, format XML Atom pour les résultats, parser XML Jackson. Fonctionnalités MCP exposées : Outils (@Tool) : Recherche de publications (search_papers). Ressources (@Resource, @ResourceTemplate) : Taxonomie des catégories arXiv, métadonnées des articles (via un template d'URI). Prompts (@Prompt) : Exemples pour résumer des articles ou construire des requêtes de recherche. Configuration : Le serveur peut fonctionner en STDIO (local) ou via HTTP Streamable (local ou distant), avec une configuration simple dans des clients comme Gemini CLI. Conclusion : Quarkus simplifie la création de serveurs MCP riches en fonctionnalités, rendant les données et services "prêts pour l'IA" avec l'aide d'outils d'IA comme Antigravity. Anthropic ne mettra pas de pub dans Claude https://www.anthropic.com/news/claude-is-a-space-to-think c'est en reaction au plan non public d'OpenAi de mettre de la pub pour pousser les gens au mode payant OpenAI a besoin de cash et est probablement le plus utilisé pour gratuit au monde Anthropic annonce que Claude restera sans publicité pour préserver son rôle d'assistant conversationnel dédié au travail et à la réflexion approfondie. Les conversations avec Claude sont souvent sensibles, personnelles ou impliquent des tâches complexes d'ingénierie logicielle où les publicités seraient inappropriées. L'analyse des conversations montre qu'une part significative aborde des sujets délicats similaires à ceux évoqués avec un conseiller de confiance. Un modèle publicitaire créerait des incitations contradictoires avec le principe fondamental d'être "genuinely helpful" inscrit dans la Constitution de Claude. Les publicités introduiraient un conflit d'intérêt potentiel où les recommandations pourraient être influencées par des motivations commerciales plutôt que par l'intérêt de l'utilisateur. Le modèle économique d'Anthropic repose sur les contrats entreprise et les abonnements payants, permettant de réinvestir dans l'amélioration de Claude. Anthropic maintient l'accès gratuit avec des modèles de pointe et propose des tarifs réduits pour les ONG et l'éducation dans plus de 60 pays. Le commerce "agentique" sera supporté mais uniquement à l'initiative de l'utilisateur, jamais des annonceurs, pour préserver la confiance. Les intégrations tierces comme Figma, Asana ou Canva continueront d'être développées en gardant l'utilisateur aux commandes. Anthropic compare Claude à un cahier ou un tableau blanc : des espaces de pensée purs, sans publicité. Infinispan 16.1 est sorti https://infinispan.org/blog/2026/02/04/infinispan-16-1 déjà le nom de la release mérite une mention Le memory bounded par cache et par ensemble de cache s est pas facile à faire en Java Une nouvelle api OpenAPI AOT caché dans les images container Un serveur MCP local juste avec un fichier Java ? C'est possible avec LangChain4j et JBang https://glaforge.dev/posts/2026/02/11/zero-boilerplate-java-stdio-mcp-servers-with-langchain4j-and-jbang/ Création rapide de serveurs MCP Java sans boilerplate. MCP (Model Context Protocol): standard pour connecter les LLM à des outils et données. Le tutoriel répond au manque d'options simples pour les développeurs Java, face à une prédominance de Python/TypeScript dans l'écosystème MCP. La solution utilise: LangChain4j: qui intègre un nouveau module serveur MCP pour le protocole STDIO. JBang: permet d'exécuter des fichiers Java comme des scripts, éliminant les fichiers de build (pom.xml, Gradle). Implémentation: se fait via un seul fichier .java. JBang gère automatiquement les dépendances (//DEPS). L'annotation @Tool de LangChain4j expose les méthodes Java aux LLM. StdioMcpServerTransport gère la communication JSON-RPC via l'entrée/sortie standard (STDIO). Point crucial: Les logs doivent impérativement être redirigés vers System.err pour éviter de corrompre System.out, qui est réservé à la communication MCP (messages JSON-RPC). Facilite l'intégration locale avec des outils comme Gemini CLI, Claude Code, etc. Reciprocal Rank Fusion : un algorithme utile et souvent utilisé pour faire de la recherche hybride, pour mélanger du RAG et des recherches par mots-clé https://glaforge.dev/posts/2026/02/10/advanced-rag-understanding-reciprocal-rank-fusion-in-hybrid-search/ RAG : Qualité LLM dépend de la récupération. Recherche Hybride : Combiner vectoriel et mots-clés (BM25) est optimal. Défi : Fusionner des scores d'échelles différentes. Solution : Reciprocal Rank Fusion (RRF). RRF : Algorithme robuste qui fusionne des listes de résultats en se basant uniquement sur le rang des documents, ignorant les scores. Avantages RRF : Pas de normalisation de scores, scalable, excellente première étape de réorganisation. Architecture RAG fréquente : RRF (large sélection) + Cross-Encoder / modèle de reranking (précision fine). RAG-Fusion : Utilise un LLM pour générer plusieurs variantes de requête, puis RRF agrège tous les résultats pour renforcer le consensus et réduire les hallucinations. Implémentation : LangChain4j utilise RRF par défaut pour agréger les résultats de plusieurs retrievers. Les dernières fonctionnalités de Gemini et Nano Banana supportées dans LangChain4j https://glaforge.dev/posts/2026/02/06/latest-gemini-and-nano-banana-enhancements-in-langchain4j/ Nouveaux modèles d'images Nano Banana (Gemini 2.5/3.0) pour génération et édition (jusqu'à 4K). "Grounding" via Google Search (pour images et texte) et Google Maps (localisation, Gemini 2.5). Outil de contexte URL (Gemini 3.0) pour lecture directe de pages web. Agents multimodaux (AiServices) capables de générer des images. Configuration de la réflexion (profondeur Chain-of-Thought) pour Gemini 3.0. Métadonnées enrichies : usage des tokens et détails des sources de "grounding". Comment configurer Gemini CLI comment agent de code dans IntelliJ grâce au protocole ACP https://glaforge.dev/posts/2026/02/01/how-to-integrate-gemini-cli-with-intellij-idea-using-acp/ But : Intégrer Gemini CLI à IntelliJ IDEA via l'Agent Client Protocol (ACP). Prérequis : IntelliJ IDEA 2025.3+, Node.js (v20+), Gemini CLI. Étapes : Installer Gemini CLI (npm install -g @google/gemini-cli). Localiser l'exécutable gemini. Configurer ~/.jetbrains/acp.json (chemin exécutable, --experimental-acp, use_idea_mcp: true). Redémarrer IDEA, sélectionner "Gemini CLI" dans l'Assistant IA. Usage : Gemini interagit avec le code et exécute des commandes (contexte projet). Important : S'assurer du flag --experimental-acp dans la configuration. Outillage PipeNet, une alternative (open source aussi) à LocalTunnel, mais un plus évoluée https://pipenet.dev/ pipenet: Alternative open-source et moderne à localtunnel (client + serveur). Usages: Développement local (partage, webhooks), intégration SDK, auto-hébergement sécurisé. Fonctionnalités: Client (expose ports locaux, sous-domaines), Serveur (déploiement, domaines personnalisés, optimisé cloud mono-port). Avantages vs localtunnel: Déploiement cloud sur un seul port, support multi-domaines, TypeScript/ESM, maintenance active. Protocoles: HTTP/S, WebSocket, SSE, HTTP Streaming. Intégration: CLI ou SDK JavaScript. JSON-IO — une librairie comme Jackson ou GSON, supportant JSON5, TOON, et qui pourrait être utile pour l'utilisation du "structured output" des LLMs quand ils ne produisent pas du JSON parfait https://github.com/jdereg/json-io json-io : Librairie Java pour la sérialisation et désérialisation JSON/TOON. Gère les graphes d'objets complexes, les références cycliques et les types polymorphes. Support complet JSON5 (lecture et écriture), y compris des fonctionnalités non prises en charge par Jackson/Gson. Format TOON : Notation orientée token, optimisée pour les LLM, réduisant l'utilisation de tokens de 40 à 50% par rapport au JSON. Légère : Aucune dépendance externe (sauf java-util), taille de JAR réduite (~330K). Compatible JDK 1.8 à 24, ainsi qu'avec les environnements JPMS et OSGi. Deux modes de conversion : vers des objets Java typés (toJava()) ou vers des Map (toMaps()). Options de configuration étendues via ReadOptionsBuilder et WriteOptionsBuilder. Optimisée pour les déploiements cloud natifs et les architectures de microservices. Utiliser mailpit et testcontainer pour tester vos envois d'emails https://foojay.io/today/testing-emails-with-testcontainers-and-mailpit/ l'article montre via SpringBoot et sans. Et voici l'extension Quarkus https://quarkus.io/extensions/io.quarkiverse.mailpit/quarkus-mailpit/?tab=docs Tester l'envoi d'emails en développement est complexe car on ne peut pas utiliser de vrais serveurs SMTP Mailpit est un serveur SMTP de test qui capture les emails et propose une interface web pour les consulter Testcontainers permet de démarrer Mailpit dans un conteneur Docker pour les tests d'intégration L'article montre comment configurer une application SpringBoot pour envoyer des emails via JavaMail Un module Testcontainers dédié à Mailpit facilite son intégration dans les tests Le conteneur Mailpit expose un port SMTP (1025) et une API HTTP (8025) pour vérifier les emails reçus Les tests peuvent interroger l'API HTTP de Mailpit pour valider le contenu des emails envoyés Cette approche évite d'utiliser des mocks et teste réellement l'envoi d'emails Mailpit peut aussi servir en développement local pour visualiser les emails sans les envoyer réellement La solution fonctionne avec n'importe quel framework Java supportant JavaMail Architecture Comment scaler un système de 0 à 10 millions d'utilisateurs https://blog.algomaster.io/p/scaling-a-system-from-0-to-10-million-users Philosophie : Scalabilité incrémentale, résoudre les goulots d'étranglement sans sur-ingénierie. 0-100 utilisateurs : Serveur unique (app, DB, jobs). 100-1K : Séparer app et DB (services gérés, pooling). 1K-10K : Équilibreur de charge, multi-serveurs d'app (stateless via sessions partagées). 10K-100K : Caching, réplicas de lecture DB, CDN (réduire charge DB). 100K-500K : Auto-scaling, applications stateless (authentification JWT). 500K-10M : Sharding DB, microservices, files de messages (traitement asynchrone). 10M+ : Déploiement multi-régions, CQRS, persistance polyglotte, infra personnalisée. Principes clés : Simplicité, mesure, stateless essentiel, cache/asynchrone, sharding prudent, compromis (CAP), coût de la complexité. Patterns d'Architecture 2026 - Du Hype à la Réalité du Terrain (Part 1/2) - https://blog.ippon.fr/2026/01/30/patterns-darchitecture-2026-part-1/ L'article présente quatre patterns d'architecture logicielle pour répondre aux enjeux de scalabilité, résilience et agilité business dans les systèmes modernes Il présentent leurs raisons et leurs pièges Un bon rappel L'Event-Driven Architecture permet une communication asynchrone entre systèmes via des événements publiés et consommés, évitant le couplage direct Les bénéfices de l'EDA incluent la scalabilité indépendante des composants, la résilience face aux pannes et l'ajout facile de nouveaux cas d'usage Le pattern API-First associé à un API Gateway centralise la sécurité, le routage et l'observabilité des APIs avec un catalogue unifié Le Backend for Frontend crée des APIs spécifiques par canal (mobile, web, partenaires) pour optimiser l'expérience utilisateur CQRS sépare les modèles de lecture et d'écriture avec des bases optimisées distinctes, tandis que l'Event Sourcing stocke tous les événements plutôt que l'état actuel Le Saga Pattern gère les transactions distribuées via orchestration centralisée ou chorégraphie événementielle pour coordonner plusieurs microservices Les pièges courants incluent l'explosion d'événements granulaires, la complexité du debugging distribué, et la mauvaise gestion de la cohérence finale Les technologies phares sont Kafka pour l'event streaming, Kong pour l'API Gateway, EventStoreDB pour l'Event Sourcing et Temporal pour les Sagas Ces patterns nécessitent une maturité technique et ne sont pas adaptés aux applications CRUD simples ou aux équipes junior Patterns d'architecture 2026 : du hype à la réalité terrain part. 2 - https://blog.ippon.fr/2026/02/04/patterns-darchitecture-2026-part-2/ Deuxième partie d'un guide pratique sur les patterns d'architecture logicielle et système éprouvés pour moderniser et structurer les applications en 2026 Strangler Fig permet de migrer progressivement un système legacy en l'enveloppant petit à petit plutôt que de tout réécrire d'un coup (70% d'échec pour les big bang) Anti-Corruption Layer protège votre nouveau domaine métier des modèles externes et legacy en créant une couche de traduction entre les systèmes Service Mesh gère automatiquement la communication inter-services dans les architectures microservices (sécurité mTLS, observabilité, résilience) Architecture Hexagonale sépare le coeur métier des détails techniques via des ports et adaptateurs pour améliorer la testabilité et l'évolutivité Chaque pattern est illustré par un cas client concret avec résultats mesurables et liste des pièges à éviter lors de l'implémentation Les technologies 2026 mentionnées incluent Istio, Linkerd pour service mesh, LaunchDarkly pour feature flags, NGINX et Kong pour API gateway Tableau comparatif final aide à choisir le bon pattern selon la complexité, le scope et le use case spécifique du projet L'article insiste sur une approche pragmatique : ne pas utiliser un pattern juste parce qu'il est moderne mais parce qu'il résout un problème réel Pour les systèmes simples type CRUD ou avec peu de services, ces patterns peuvent introduire une complexité inutile qu'il faut savoir éviter Méthodologies Le rêve récurrent de remplacer voire supprimer les développeurs https://www.caimito.net/en/blog/2025/12/07/the-recurring-dream-of-replacing-developers.html Depuis 1969, chaque décennie voit une tentative de réduire le besoin de développeurs (de COBOL, UML, visual builders… à IA). Motivation : frustration des dirigeants face aux délais et coûts de développement. La complexité logicielle est intrinsèque et intellectuelle, non pas une question d'outils. Chaque vague technologique apporte de la valeur mais ne supprime pas l'expertise humaine. L'IA assiste les développeurs, améliore l'efficacité, mais ne remplace ni le jugement ni la gestion de la complexité. La demande de logiciels excède l'offre car la contrainte majeure est la réflexion nécessaire pour gérer cette complexité. Pour les dirigeants : les outils rendent-ils nos développeurs plus efficaces sur les problèmes complexes et réduisent-ils les tâches répétitives ? Le "rêve" de remplacer les développeurs, irréalisable, est un moteur d'innovation créant des outils précieux. Comment creuser des sujets à l'ère de l'IA générative. Quid du partage et la curation de ces recherches ? https://glaforge.dev/posts/2026/02/04/researching-topics-in-the-age-of-ai-rock-solid-webhooks-case-study/ Recherche initiale de l'auteur sur les webhooks en 2019, processus long et manuel. L'IA (Deep Research, Gemini, NotebookLM) facilite désormais la recherche approfondie, l'exploration de sujets et le partage des résultats. L'IA a identifié et validé des pratiques clés pour des déploiements de webhooks résilients, en grande partie les mêmes que celles trouvées précédemment par l'auteur. Génération d'artefacts par l'IA : rapport détaillé, résumé concis, illustration sketchnote, et même une présentation (slide deck). Guillaume s'interroge sur le partage public de ces rapports de recherche générés par l'IA, tout en souhaitant éviter le "AI Slop". Loi, société et organisation Le logiciel menacé par le vibe coding https://www.techbuzz.ai/articles/we-built-a-monday-com-clone-in-under-an-hour-with-ai Deux journalistes de CNBC sans expérience de code ont créé un clone fonctionnel de Monday.com en moins de 60 minutes pour 5 à 15 dollars. L'expérience valide les craintes des investisseurs qui ont provoqué une baisse de 30% des actions des entreprises SaaS. L'IA a non seulement reproduit les fonctionnalités de base mais a aussi recherché Monday.com de manière autonome pour identifier et recréer ses fonctionnalités clés. Cette technique appelée "vibe-coding" permet aux non-développeurs de construire des applications via des instructions en anglais courant. Les entreprises les plus vulnérables sont celles offrant des outils "qui se posent sur le travail" comme Atlassian, Adobe, HubSpot, Zendesk et Smartsheet. Les entreprises de cybersécurité comme CrowdStrike et Palo Alto sont considérées plus protégées grâce aux effets de réseau et aux barrières réglementaires. Les systèmes d'enregistrement comme Salesforce restent plus difficiles à répliquer en raison de leur profondeur d'intégration et de données d'entreprise. Le coût de 5 à 15 dollars par construction permet aux entreprises de prototyper plusieurs solutions personnalisées pour moins cher qu'une seule licence Monday.com. L'expérience soulève des questions sur la pérennité du marché de 5 milliards de dollars des outils de gestion de projet face à l'IA générative. Conférences En complément de l'agenda des conférences de Aurélie Vache, il y a également le site https://javaconferences.org/ (fait par Brian Vermeer) avec toutes les conférences Java à venir ! La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 12-13 février 2026 : Touraine Tech #26 - Tours (France) 12-13 février 2026 : World Artificial Intelligence Cannes Festival - Cannes (France) 19 février 2026 : ObservabilityCON on the Road - Paris (France) 6 mars 2026 : WordCamp Nice 2026 - Nice (France) 18 mars 2026 : Jupyter Workshops: AI in Jupyter: Building Extensible AI Capabilities for Interactive Computing - Saint-Maur-des-Fossés (France) 18-19 mars 2026 : Agile Niort 2026 - Niort (France) 20 mars 2026 : Atlantique Day 2026 - Nantes (France) 26 mars 2026 : Data Days Lille - Lille (France) 26-27 mars 2026 : SymfonyLive Paris 2026 - Paris (France) 26-27 mars 2026 : REACT PARIS - Paris (France) 27-29 mars 2026 : Shift - Nantes (France) 31 mars 2026 : ParisTestConf - Paris (France) 31 mars 2026-1 avril 2026 : FlowCon France 2026 - Paris (France) 1 avril 2026 : AWS Summit Paris - Paris (France) 2 avril 2026 : Pragma Cannes 2026 - Cannes (France) 2-3 avril 2026 : Xen Spring Meetup 2026 - Grenoble (France) 7 avril 2026 : PyTorch Conference Europe - Paris (France) 9-10 avril 2026 : Android Makers by droidcon 2026 - Paris (France) 9-11 avril 2026 : Drupalcamp Grenoble 2026 - Grenoble (France) 16-17 avril 2026 : MiXiT 2026 - Lyon (France) 17-18 avril 2026 : Faiseuses du Web 5 - Dinan (France) 22-24 avril 2026 : Devoxx France 2026 - Paris (France) 23-25 avril 2026 : Devoxx Greece - Athens (Greece) 6-7 mai 2026 : Devoxx UK 2026 - London (UK) 12 mai 2026 : Lead Innovation Day - Leadership Edition - Paris (France) 19 mai 2026 : La Product Conf Paris 2026 - Paris (France) 21-22 mai 2026 : Flupa UX Days 2026 - Paris (France) 22 mai 2026 : AFUP Day 2026 Lille - Lille (France) 22 mai 2026 : AFUP Day 2026 Paris - Paris (France) 22 mai 2026 : AFUP Day 2026 Bordeaux - Bordeaux (France) 22 mai 2026 : AFUP Day 2026 Lyon - Lyon (France) 28 mai 2026 : DevCon 27 : I.A. & Vibe Coding - Paris (France) 28 mai 2026 : Cloud Toulouse 2026 - Toulouse (France) 29 mai 2026 : NG Baguette Conf 2026 - Paris (France) 29 mai 2026 : Agile Tour Strasbourg 2026 - Strasbourg (France) 2-3 juin 2026 : Agile Tour Rennes 2026 - Rennes (France) 2-3 juin 2026 : OW2Con - Paris-Châtillon (France) 3 juin 2026 : IA–NA - La Rochelle (France) 5 juin 2026 : TechReady - Nantes (France) 5 juin 2026 : Fork it! - Rouen - Rouen (France) 6 juin 2026 : Polycloud - Montpellier (France) 9 juin 2026 : JFTL - Montrouge (France) 9 juin 2026 : C: - Caen (France) 11-12 juin 2026 : DevQuest Niort - Niort (France) 11-12 juin 2026 : DevLille 2026 - Lille (France) 12 juin 2026 : Tech F'Est 2026 - Nancy (France) 16 juin 2026 : Mobilis In Mobile 2026 - Nantes (France) 17-19 juin 2026 : Devoxx Poland - Krakow (Poland) 17-20 juin 2026 : VivaTech - Paris (France) 18 juin 2026 : Tech'Work - Lyon (France) 22-26 juin 2026 : Galaxy Community Conference - Clermont-Ferrand (France) 24-25 juin 2026 : Agi'Lille 2026 - Lille (France) 24-26 juin 2026 : BreizhCamp 2026 - Rennes (France) 2 juillet 2026 : Azur Tech Summer 2026 - Valbonne (France) 2-3 juillet 2026 : Sunny Tech - Montpellier (France) 3 juillet 2026 : Agile Lyon 2026 - Lyon (France) 6-8 juillet 2026 : Riviera Dev - Sophia Antipolis (France) 2 août 2026 : 4th Tech Summit on Artificial Intelligence & Robotics - Paris (France) 20-22 août 2026 : 4th Tech Summit on AI & Robotics - Paris (France) & Online 4 septembre 2026 : JUG Summer Camp 2026 - La Rochelle (France) 17-18 septembre 2026 : API Platform Conference 2026 - Lille (France) 24 septembre 2026 : PlatformCon Live Day Paris 2026 - Paris (France) 1 octobre 2026 : WAX 2026 - Marseille (France) 1-2 octobre 2026 : Volcamp - Clermont-Ferrand (France) 5-9 octobre 2026 : Devoxx Belgium - Antwerp (Belgium) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/

Atareao con Linux
ATA 770 ¡Deja de usar contraseñas en tus Docker Compose! Descubre Podman Secrets

Atareao con Linux

Play Episode Listen Later Feb 12, 2026 22:12


¿Te preocupa tener tus claves y contraseñas en texto plano? En este episodio 770 de Atareao con Linux, te explico por qué deberías dejar de usar variables de entorno tradicionales y cómo Podman Secrets puede salvarte el día. Yo mismo he pasado años ignorando este problema en Docker por la pereza de configurar Swarm, pero con Podman, la seguridad viene de serie.Hablaremos en profundidad sobre el ciclo de vida de los secretos: cómo crearlos, listarlos, inspeccionarlos y borrarlos. Te mostraré cómo Podman gestiona estos datos sensibles fuera de las imágenes y fuera del alcance de miradas indiscretas en el historial de Bash. Es un cambio de paradigma para cualquier SysAdmin o entusiasta del Self-hosting.Pero no nos quedamos ahí. Te presento Crypta, mi nueva herramienta escrita en Rust que integra SOPS, Age y Git para que puedas gestionar tus secretos de forma profesional, permitiendo incluso la sincronización con repositorios remotos. Veremos cómo configurar drivers personalizados y cómo usar secretos en tus despliegues con MariaDB y Quadlets.Capítulos destacados:00:00:00 El peligro de las contraseñas en texto plano00:01:23 El problema con Docker Swarm y por qué elegir Podman00:03:16 ¿Qué es realmente un Secreto en Podman?00:04:22 Ciclo de vida: Creación y muerte de un secreto00:08:10 Implementación práctica en MariaDB y Quadlets00:12:04 Presentando Crypta: Gestión con SOPS, Age y Rust00:19:40 Ventajas de usar secretos en modo RootlessSi quieres que tu infraestructura sea realmente segura y coherente, este episodio es una hoja de ruta esencial. Aprende a ocultar lo que debe estar oculto y a dormir tranquilo sabiendo que tus tokens de API no están al alcance de cualquiera.Más información y enlaces en las notas del episodio

El gato de Turing
180 – Homelabs

El gato de Turing

Play Episode Listen Later Feb 11, 2026 100:41


En el anterior episodio hablamos largo y tendido sobre los "homelabs" o laboratorios de prueba informáticos que muchos tenemos en casa. Hemos recibido muchísimos comentarios y hoy repasamos qué tenéis cada uno en casa, y aprendemos juntos sobre muchísimas de estas herramientas. Además, os dejamos una lista de enlaces de todas estas herramientas y hardware para que podáis empezar a montar vuestra propia versión para aprender y probar cosas nuevas: Herramientas Guía de Iban para una transición a alternativas europeas Home Assistant (domótica libre) Kopia (copias de seguridad) Tailscale (VPN entre tus dispositivos, open-source con headscale) authentik (proveedor de identidad privado) immich (gestor de fotos) Komga (gestor de cómics, libros) plex (gestor multimedia de pago) Jellyfin (gestor multimedia) Omoide (gestor multimedia) TeslaMate (gestión de tu Tesla) Heimdall (landing page) Syncthing (sincronización de ficheros) Proxmox (virtualización) Adguard (bloqueo de publicidad) Pi-hole (DNS con bloqueo de publicidad u otras categorías) Unbound (DNS local) Mealie (gestor de recetas de cocina) Obsidian (gestor de notas) K3S (Kubernetes liviano) WireGuard (VPN) podman (contenedores) Docker (contenedores) Harbor (repositorio de contenedores) Verdaccio (registro NPM) Forgejo (repositorios Git) Gitea (repositorios Git) RustFS (servidor S3) cert-manager (certificados TLS en Kubernetes) step-ca (Let's Encrypt local) TrueNAS (SO para NAS) Kiwix (copia local de wikipedia y otras wikis) Prometheus (métricas y monitorización) Grafana (gráficos de métricas) ArgoCD (CI/CD) FluxCD (CI/CD) vLLM (IA generativa local compatible con API de OpenAI) Open WebUI (interfaz web para IA generativa) Hardware Switchbot (domótica) Shelly (relés y domótica) Aqara (domótica) Eve (domótica) Inels Wireless (domótica) Reolink (cámaras de seguridad) GMKtec (mini-PCs) EliteDesk (mini-PCs) QNAP (NAS) Synology (NAS) Raspberry Pi (mini-PCs) Noticias IKEA lanza 21 nuevos productos para un hogar inteligente Sánchez anuncia que España prohibirá acceder a las redes sociales a los menores de 16 años El fundador de Telegram carga contra Pedro Sánchez y alerta a España con un mensaje masivo Música del episodio Introducción: Safe and Warm in Hunter's Arms - Roller Genoa Cierre: Inspiring Course Of Life - Alex Che Puedes encontrarnos en Mastodon y apoyarnos escuchando nuestro podcast en Podimo o haciéndote fan en iVoox. Si quieres un mes gratis en iVoox Premium, haz click aquí.

Rails with Jason
309 - How I Built SaturnCI (Starring JP Camara)

Rails with Jason

Play Episode Listen Later Feb 10, 2026 77:08 Transcription Available


In this episode I talk with JP Camara about RubyConf 2026, submitting CFPs, and why everyone should give talks. JP shares his experience using SaturnCI on the Mastodon project, and we dig into Saturn CI's Docker-based setup, Kubernetes architecture, and test-focused UX philosophy.Links:jpcamara.comSaturnCINonsense Monthly

Python Bytes
#469 Commands, out of the terminal

Python Bytes

Play Episode Listen Later Feb 9, 2026 33:56 Transcription Available


Topics covered in this episode: Command Book App uvx.sh: Install Python tools without uv or Python Ending 15 years of subprocess polling monty: A minimal, secure Python interpreter written in Rust for use by AI Extras Joke Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training The Complete pytest Course Patreon Supporters Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: Command Book App New app from Michael Command Book App is a native macOS app for developers, data scientists, AI enthusiasts and more. This is a tool I've been using lately to help build Talk Python, Python Bytes, Talk Python Training, and many more applications. It's a bit like advanced terminal commands or complex shell aliases, but hosted outside of your terminal. This leaves the terminal there for interactive commands, exploration, short actions. Command Book manages commands like "tail this log while I'm developing the app", "Run the dev web server with true auto-reload", and even "Run MongoDB in Docker with exactly the settings I need" I'd love it if you gave it a look, shared it with your team, and send me feedback. Has a free version and paid version. Build with Swift and Swift UI Check it out at https://commandbookapp.com Brian #2: uvx.sh: Install Python tools without uv or Python Tim Hopper Michael #3: Ending 15 years of subprocess polling by Giampaolo Rodola The standard library's subprocess module has relied on a busy-loop polling approach since the timeout parameter was added to Popen.wait() in Python 3.3, around 15 years ago The problem with busy-polling CPU wake-ups: even with exponential backoff (starting at 0.1ms, capping at 40ms), the system constantly wakes up to check process status, wasting CPU cycles and draining batteries. Latency: there's always a gap between when a process actually terminates and when you detect it. Scalability: monitoring many processes simultaneously magnifies all of the above. + L1/L2 CPU cache invalidations It's interesting to note that waiting via poll() (or kqueue()) puts the process into the exact same sleeping state as a plain time.sleep() call. From the kernel's perspective, both are interruptible sleeps. Here is the merged PR for this change. Brian #4: monty: A minimal, secure Python interpreter written in Rust for use by AI Samuel Colvin and others at Pydantic Still experimental “Monty avoids the cost, latency, complexity and general faff of using a full container based sandbox for running LLM generated code. “ “Instead, it lets you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.” Extras Brian: Expertise is the art of ignoring - Kevin Renskers You don't need to master the language. You need to master your slice. Learning everything up front is wasted effort. Experience changes what you pay attention to. I hate fish - Rands (Michael Lopp) Really about productivity systems And a nice process for dealing with email Michael: Talk Python now has a CLI New essay: It's not vibe coding - Agentic engineering GitHub is having a day Python 3.14.3 and 3.13.12 are available Wall Street just lost $285 billion because of 13 markdown files Joke: Silence, current side project!

Brad & Will Made a Tech Pod.
325: renderDEEZ128

Brad & Will Made a Tech Pod.

Play Episode Listen Later Feb 8, 2026 104:13


It's been a while since we did a deep dive on our home networking and server infrastructure (what some might call a "homelab"), so it's time for the 2026 check-in to run down what we're working with these days. By request, we spend a big chunk of the episode on Brad's plain Linux NAS/server, detailing components like Samba, Docker (or Podman), and Sanoid that you'd need to set up yourself to replicate the functionality of something like TrueNAS or Unraid. We also survey Will's more granular approach, once again pine longingly after Wildcat Lake, and more.Show notes with all the hardware and software we mentioned: https://tinyurl.com/techpod-325-homelab-update Support the Pod! Contribute to the Tech Pod Patreon and get access to our booming Discord, a monthly bonus episode, your name in the credits, and other great benefits! You can support the show at: https://patreon.com/techpod

Cyber Briefing
February 05, 2026 - Cyber Briefing

Cyber Briefing

Play Episode Listen Later Feb 5, 2026 7:45


If you like what you hear, please subscribe, leave us a review and tell a friend!

Atareao con Linux
ATA 768 Quadlets. La pieza secreta de Podman que cambiará tu servidor para siempre

Atareao con Linux

Play Episode Listen Later Feb 5, 2026 22:43


¿Sigues usando Docker porque te da pereza el cambio? En este episodio de Atareao con Linux te voy a demostrar por qué los Quadlets son la razón definitiva para que te decantes por Podman de una vez por todas. Si ya te hablé de los Pods y te pareció interesante, lo de hoy es llevar la gestión de contenedores al siguiente nivel: la integración TOTAL con SystemD.En el episodio 768, te explico cómo los Quadlets permiten gestionar tus contenedores, volúmenes y redes exactamente como si fueran servicios nativos de tu sistema operativo. Olvídate de scripts extraños o de depender de herramientas externas; aquí todo se define con archivos de configuración sencillos (.container, .network, .volume) que SystemD entiende a la perfección.Te cuento mi experiencia real migrando mis proyectos actuales. Ya tengo bases de datos PostgreSQL funcionando bajo este modelo y la estabilidad es, simplemente, de otro planeta. Veremos cómo levantar un stack completo de WordPress con MariaDB y Redis utilizando esta tecnología, gestionando las dependencias entre ellos con las directivas 'After' y 'Requires' de SystemD. ¡Se acabó el que un contenedor intente arrancar antes de que la base de datos esté lista!Capítulos del episodio: 00:00:00 Introducción y el adiós definitivo a Docker 00:01:33 ¿Qué es un Quadlet y por qué revoluciona Linux? 00:03:22 Los 6 tipos de Quadlets disponibles 00:05:12 Cómo gestionar un Quadlet de tipo contenedor 00:06:46 Definiendo Redes y Volúmenes como servicios 00:08:13 El flujo de trabajo: Git, secretos y portabilidad 00:11:22 Integración con SystemD: Nombres y prefijos 00:13:42 Desplegando un Stack completo: WordPress, MariaDB y Redis 00:16:02 Modificando contenedores y recarga de SystemD (Daemon-reload) 00:17:50 Logs con JournalCTL y mantenimiento simplificado 00:19:33 Auto-update: Olvídate de Watchtower para siempre 00:20:33 Conclusiones y próximos pasos en la migraciónAdemás, exploramos ventajas brutales como el control de versiones mediante Git, la gestión de logs centralizada con JournalCTL y las actualizaciones automáticas nativas que harán que te olvides de Watchtower. Si quieres que tu servidor Linux sea más profesional, robusto y fácil de mantener, no puedes perderte este audio.Más información y enlaces en las notas del episodio

The Changelog
Setting Docker Hardened Images free (Interview)

The Changelog

Play Episode Listen Later Feb 4, 2026 76:49


In May of 2025, Docker launched Hardened Images, a secure, minimal, production-ready set of images. In December, they made DHI freely available and open source to everyone who builds software. On this episode, we're joined by Tushar Jain, EVP of Engineering at Docker to learn all about it.

engineering images evp docker hardened dhi adam stacoviak jerod santo tushar jain
Cyber Security Today
Critical Cybersecurity Updates: Fortinet, Docker, and Android Malware

Cyber Security Today

Play Episode Listen Later Feb 4, 2026 10:24


In this episode of Cybersecurity Today, Jim Love covers major vulnerabilities and security threats, including the exposure of over 3 million Fortinet devices, a critical flaw in Docker's AI assistant, and a sophisticated Android malware campaign using Hugging Face repositories. Discover the latest updates on these critical issues and gain insights into the measures being taken to mitigate these threats. Sponsored by Meter, providing integrated networking solutions for performance and scale. Cybersecurity Today would like to thank Meter for their support in bringing you this podcast. Meter delivers a complete networking stack, wired, wireless and cellular in one integrated solution that's built for performance and scale. You can find them at Meter.com/cst 00:00 Introduction and Sponsor Message 00:43 Fortinet Devices Vulnerability 03:35 Docker AI Assistant Security Flaw 06:27 Hugging Face Android Malware Campaign 09:25 Conclusion and Sponsor Message

Ardan Labs Podcast
Hardware, Innovation, and The Space Safe with Oscar Hedaya

Ardan Labs Podcast

Play Episode Listen Later Feb 4, 2026 88:48


In this episode of the Ardan Labs Podcast, Ale Kennedy debuts as host in her first episode, sitting down with Oscar Hedaya, founder of SPACE, to discuss building startups, navigating uncertainty, and launching innovative products.Oscar shares his journey from New Jersey to Miami, the childhood financial challenges that shaped his work ethic, and the lessons learned from college, job searching, and early setbacks. The conversation explores what it takes to start a company, develop a physical product in a competitive market, and turn setbacks into momentum. Together, Ale and Oscar examine persistence, partnership dynamics, and how identifying gaps in the market led to the creation of The Space Safe.00:00 Introduction and Background02:13 Smart Safes and Security Innovation07:14 Childhood and Early Influences12:57 College Applications and Transitions28:51 College Decisions and Academic Paths42:15 Graduation and Job Market Reality54:26 Starting a Business59:43 Restarting the Entrepreneurial Journey01:10:29 The Birth of The Space Safe01:18:48 Product Development Challenges01:23:49 Launching SpaceSafeConnect with Oscar: LinkedIn: https://www.linkedin.com/in/ohedaya/Mentioned in this Episode:The Space Safe Website: https://www.thespacesafe.comWant more from Ardan Labs? You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!Online Courses : https://ardanlabs.com/education/ Live Events : https://www.ardanlabs.com/live-training-events/ Blog : https://www.ardanlabs.com/blog Github : https://github.com/ardanlabs

Changelog Master Feed
Setting Docker Hardened Images free (Changelog Interviews #675)

Changelog Master Feed

Play Episode Listen Later Feb 4, 2026 76:49


In May of 2025, Docker launched Hardened Images, a secure, minimal, production-ready set of images. In December, they made DHI freely available and open source to everyone who builds software. On this episode, we're joined by Tushar Jain, EVP of Engineering at Docker to learn all about it.

engineering images evp docker hardened changelog dhi adam stacoviak jerod santo tushar jain
The Uptime Wind Energy Podcast
North Sea Summit, Vineyard Wind Back to Work

The Uptime Wind Energy Podcast

Play Episode Listen Later Feb 3, 2026 31:35


Allen, Joel, and Yolanda discuss the North Sea Summit where nine European countries committed to 100 gigawatts of offshore wind capacity and the massive economic impact that comes with it. They also break down the federal court ruling that allows Vineyard Wind to resume construction with a tight 45-day window before installation vessels leave. Plus GE Vernova’s Q4 results show $600 million in wind losses and Wind Power Lab CEO Lene Helstern raises concerns about blade quality across the industry. Sign up now for Uptime Tech News, our weekly newsletter on all things wind technology. This episode is sponsored by Weather Guard Lightning Tech. Learn more about Weather Guard’s StrikeTape Wind Turbine LPS retrofit. Follow the show on YouTube, Linkedin and visit Weather Guard on the web. And subscribe to Rosemary’s “Engineering with Rosie” YouTube channel here. Have a question we can answer on the show? Email us! The Uptime Wind Energy Podcast brought to you by Strike Tape, protecting thousands of wind turbines from lightning damage worldwide. Visit strike tape.com. And now your hosts, Allen Hall, Rosemary Barnes, Joel Saxum, and Yolanda Padron.  Speaker 2: Welcome to the Uptime Wind Energy Podcast. I’m your host, Alln Hall. I’m here with Yolanda Padron and Joel Saxum. Rosemary Barnes is snorkeling at the Greek Barrier Reef this week, uh, big news out of Northern Europe. Uh, the Northeast Summit, which happened in Hamburg, uh, about a week or so ago, nine European countries are. Making a huge commitment for offshore wind. So it’s the, the countries involved are Britain, Belgium, Denmark, France, Germany, Iceland, question Mark Ireland, Luxembourg, Netherlands, and Norway. That together they want to develop [00:01:00] 100 gigawatts of offshore wind capacity in shared waters. Uh, that’s enough to power about. 85 million households and the PAC comes as Europe is trying to wean itself from natural gas from where they had it previously and the United States. Uh, so they, they would become electricity in independent. Uh, and this is one way to do it. Two big happy, uh, companies. At the moment, Vattenfall who develops s lot offshore and Siemens gaa of course, are really excited by the news. If you run the numbers and you, you, you have a hundred gigawatts out in the water and you’re using 20 megawatt turbines, then you’re talking about 5,000 turbines in the water total. That is a huge offshore wind order, and I, I think this would be great news for. Obviously Vestas and [00:02:00] Siemens cesa. Uh, the, the question is there’s a lot of political maneuvering that is happening. It looks like Belgium, uh, as a country is not super active and offshore and is rethinking it and trying to figure out where they want to go. But I think the big names will stay, right? France and Germany, all in on offshore. Denmark will be Britain already is. So the question really is at the moment then. Can Siemens get back into the win game and start making money because they have projected themselves to be very profitable coming this year, into this year. This may be the, the stepping stone, Joel.  Joel Saxum: Well, I think that, yeah, we talked about last week their 21 megawatt, or 21 and a half megawatt. I believe it is. Big new flagship going to be ready to roll, uh, with the big auctions happening like AR seven in the uk. Uh, and you know, that’s eight gigawatts, 8.4 gigawatts there. People are gonna be, the, the order book’s gonna start to fill up, like [00:03:00]Siemens is, this is a possibility of a big turnaround. And to put some of these numbers in perspective, um, a hundred gigawatts of offshore wind. So what does that really mean? Right? Um, what it means is if you, if you take the, if you take two of the industrial big industrial powerhouses that are a part of this pact, the UK and Germany combine their total demand. That’s a hundred gigawatt. That’s what they, that’s what their demand is basically on a, you know, today. Right? So that’s gonna continue to grow, right? As, uh, we electrify a lot of things. And the indus, you know, the, the next, the Industrial Revolution 4.0 or whatever we’re calling it now is happening. Um, that’s, that’s a possibility, right? So this a hundred gigawatts of offshore wind. Is gonna drive jobs all up all over Europe. Right. This isn’t just a jobs at the port in Rotterdam or wherever it may be. Right? This is, this is manufacturing jobs, supply chain jobs, the same stuff we’ve been talking about on the podcast for a while here with [00:04:00] what the UK is doing with OWGP and the, or e Catapult and all the kind of the monies that the, the, the Crown and, and other, uh, private entities are putting in there. They’re starting to really, they’re, or this a hundred gigawatts is really gonna look like building out that local supply chain. Jobs, all these different things. ’cause Alan, like you, you mentioned off air. If you look at a hundred gigawatts of offshore wind, that’s $200 billion or was to put it in Euros, 175 billion euros, 170 billion euros, just in turbine orders. Right. That doesn’t mean, or that doesn’t cover ships, lodging, food, like, you know, everything around the ports like tools, PPE, all of the stuff that’s needed by this industry. I mean, there’s a, there’s a trillion dollar impact here.  Speaker 2: Oh, it’s close. Yeah. It’s at least 500 billion, I would say. And Yolanda, from the asset management side, have we seen anything of this scale to manage? It does seem like there’d be a lot of [00:05:00] turbines in the water. A whole bunch of moving pieces, ships, turbines, cables, transformers, substations, going different directions. How, what kind of infrastructure is that going to take?  Yolanda Padron: You know, a lot of the teams that are there, they’re used to doing this on a grand scale, but globally, right? And so having this be all at once in the UK is definitely gonna be interesting. It’ll be a good opportunity for everybody to take all of the lessons learned to, to just try to make sure that they don’t come across any issues that they might have seen in the past, in other sites, in other countries. They just bring everything back home to their countries and then just make sure that everything’s fine. Um, from like development, construction, and, and operations.  Joel Saxum: I was thinking about that. Just thinking about development, construction, operations, right? So some of [00:06:00] these sites we’re thinking about like how, you know, that, that, that map of offshore wind in, in the Northern Atlantic, right? So if this is gonna go and we’re talking about the countries involved here, Norway, Germany, Denmark, France, Belgium, you’re gonna have it all over. So into the Baltic Sea. Around Denmark, into the Norwegian waters, uk, Ireland all the way over, and Iceland is there. I don’t think there’s gonna be any development there. I think maybe they’re just there as a, as cheerleaders. Um, offtake, possibly, yes. Some cables running over there. But you’re going to need to repurpose some of the existing infrastructure, or you’re not, not, you’re going to need to, you’re going to get the opportunity to, and this hasn’t happened in offshore wind yet, right? So. Basically repowering offshore wind, and you’re going to be able to look at, you know, you’re not doing, um, greenfield geotechnical work and greenfield, um, sub c mapping. Like, some of those things are done right, or most of those things are done. So there, I know there’s a lot of, like, there’s a, there’s two and [00:07:00] three and six and seven megawatt turbines all over the North Atlantic, so we’re gonna be able to pop some of those up. Put some 15 and 20 megawatt machines in place there. I mean, of course you’re not gonna be able to reuse the same mono piles, but when it comes to Yolanda, like you said, the lessons learned, Hey, the vessel plans for this area are done. The how, how, how we change crews out here, the CTVs and now and SOVs into port and that stuff, that those learnings are done. How do we maintain export cables and inter array cables with the geotechnic here, you’re not in a green field, you’re in a brown field. That, that, that work. A lot of those lessons learned. They’re done, right? You’ve, you’ve stumbled through them, you’ve made those mistakes. You’ve had to learn on the fly and go ahead here. But when you go to the next phase of Repowering, an offshore wind farm, the the Dev X cost is gonna go way down, in my opinion. Now, someone, someone may fight back on that and say, well, we have to go do some demolition or something of that sort. I’m not sure, but [00:08:00] Yolanda Padron: yeah. But I think, you know. We like to complain sometimes in the US about how some of the studies just aren’t catered toward us, right? And so we’ve seen it a lot and it’s a lot of the studies that are made are just made in Europe where, where this is all taking place. So it’s gonna be really, really interesting to see such a massive growth where everything’s being developed and where the studies are localized from where. You have this very niche area and they can, they’ve studied it. They know exactly what’s going on there. And to your point, they’ve seen a lot of, they’ve minimized the risk, like the environmental risks as much as they could. Right. And so it’s, it’s going to be really, really interesting to have them  Joel Saxum: ensuring and financing these projects should be way easier  Speaker 2: when Europe is saying that the industry has pledged to cut costs by 30% between. 20, 25 and 2040. So you would think that the turbine [00:09:00] costs and the installation costs would have to be really cost conscious on the supply chain and, uh, taking lessons learned from the previous generations of offshore wind. I think that makes sense. 30% is still a lot, and I, I think the, the feeling I’m getting from this is, Hey, we’re making a hundred gigawatt commitment to this industry. You have to work really hard to deliver a efficient product, get the cost down so it’s not costing as much as, you know. Could do if we, if we did it today, and we’re kind of in from an offshore standpoint over in Europe, what a generation are we in, in terms of turbines three? Are we going into four? A lot of lessons learned. Joel Saxum: Yeah. The, the new Siemens one’s probably generation four. Yeah. I would say generation four in the new, because you went from like the two and three megawatt machines. Like there’s like Vesta three megawatts all over the place, and then you went into the directive [00:10:00] machines. You got into that seven and eight megawatt class, and then you got into the, where we’re at now, the 15, the 12 and 15 megawatt units, the Docker bank style stuff, and then I would say generation four is the, yeah, the Siemens 21 and a half machine. Um, that’s a good way to look at it. Alan four we’re on the fourth generation of offshore wind and, and so it’s Generation one is about ready to start being cycled. There’s some, and some of these are easier, they’re nearer to shore. We’ll see what, uh, who starts to take those projects on. ’cause that’s gonna be an undertaking too. Question on the 30%, uh, wind Europe says industry has pledged to cut cost by 30% by 20. Is that. LCOE or is it devex costs or is it operational costs or did they, were they specific on it or they just kinda like cut cutting costs?  Speaker 2: My recollection when that first came about, which was six months ago, maybe a little longer, it was LCOE, [00:11:00] right? So they’re, they’re trying to drive down the, uh, dollars per, or euros per megawatt hour output, but that the capital costs, if the governments can help with the capital costs. On the interest rates, just posting bonds and keeping that down, keeping the interest rates low for these projects by funding them somehow or financing them, that will help a tremendous amount. ’cause if. Interest rates remain high. I know Europe is much lower than it is in the United States at the minute, but if they interest rates start to creep up, these projects will not happen. They’re marginal  Joel Saxum: because you have your central in, in, in Europe, you have your central bank interest rates, but even like the f the, the Indi Individual nation states will subsidize that. Right? Like if you go to buy a house in Denmark right now, you pay like 1.2%. Interest  Speaker 2: compared to what, six and a half right now in the states? Yeah, it’s low.  Speaker 4: Australia’s wind farms are [00:12:00] growing fast. But are your operations keeping up? Join us February 17th and 18th at Melbourne’s Pullman on the park for Wind energy o and M Australia 2026, where you’ll connect with the experts solving real problems in maintenance asset management. And OEM relations. Walk away with practical strategies to cut costs and boost uptime that you can use the moment you’re back on site. Register now at WMA 2020 six.com. Wind Energy o and m Australia is created by wind professionals for wind professionals because this industry needs solutions, not speeches,  Speaker 2: as we all know. On December 22nd, the federal government issued a stop work order. On all offshore winds that included vineyard wind up off the coast of Massachusetts, that’s a 62 turbine, $4.5 billion wind farm. Uh, that’s being powered by some GE turbines. Uh, the government [00:13:00] has, uh, cited national security concerns, but vineyard went to court and Federal Judge Brian Murphy rolled the, the administration failed to adequately explain or justify the decision to shut it down. Uh, the judge issued a stay, which it is allowing Vineyard went to immediately resume work on the project now. They’re close to being finished at a vineyard. There are 44 turbines that are up and running right now and creating power and delivering power on shore. There are 17 that are partially installed. Uh, when the stop order came. The biggest issue at the moment, if they can’t get rolling again, there are 10 towers with Noels on them, what they call hammerheads. That don’t have blades. And, uh, the vineyard wind. Last week as we were recording this, said you really don’t want hammerheads out in the water because they become a risk. They’re not assembled, completed [00:14:00] items. So lightning strikes and other things could happen, and you really don’t want them to be that way. You want to finish those turbines, so now they have an opportunity to do it. The window’s gonna be short. And Yolanda listening to some GE discussions, they were announcing their Q4 results from last year. The ships are available till about the end of March, and then the ships are gonna finally go away and go work on another project. So they have about 45 days to get these turbines done. I guess my question is, can they get it done work-wise? And I, I, I guess the, the issue is they gotta get the turbines running and if they do maintenance on it, that’s gonna be okay. So I’m wondering what they do with blade sets. Do they have a, a set of blades that are, maybe they pass QC but they would like them to be better? Do they install ’em just to get a turbine operational even temporarily to get this project quote unquote completed so they can get paid?  Yolanda Padron: Yeah. If, if the risk is low, low [00:15:00] enough, it, it should be. I mean a little bit tight, but what, what else can you do? Right? I mean, the vessel, like you might have a shot of getting the vessel back eventually, or being able to get something in so you can do some of the blade repairs. And the blade repairs of tower would require a different vessel than like bringing in a whole blade, right? And so just. You have a very limited time scope to be able to do everything. So I don’t know that I would risk just not being able to pull this off altogether and just risk the, you know, the rest of the tower by not having a complete, you know, LPS and everything on there just because not everything’s a hundred percent perfect. Joel Saxum: There’s a weird mix in technical and commercial risk here, right? Because. Technically, we have these hammerheads out there, right? There’s a million things that can happen with those. Like I, I’ve [00:16:00] personally done RCAs where, um, you have a hammerhead on this was onshore, right? But they, they will get, um, what’s called, uh, Viv, uh, vortex induced vibration. So when they don’t have the full components out there, wind will go by and they’ll start to shake these things. I’ve seen it where they shook them so much because they’re not designed to be up there like that. They shook them so much that like the bolts started loosening and concrete started cracking in the foundations and like it destroyed the cable systems inside the tower ’cause they sat there and vibrated so violently. So like that kind of stuff is a possibility if you don’t have the right, you know. Viv protection on and those kind of things, let alone lightning risk and some other things. So you have this technical risk of them sitting out there like that. But you also have the commercial risk, right? Because the, the banks, the financiers, the insurance companies, there’s the construction policies and there’s, there’s, you gotta hit these certain timelines or it’s just like if you’re building a house, right? You’re building a house, you have to go by the loan that the bank gives you in, you know, in micro [00:17:00] terms to kind of think about that. That’s the same thing that happens with this project, except for this project’s four and a half billion dollars and probably has. It’s 6, 8, 10 banks involved in it. Right? So you have a lot of, there’s a lot of commercial risk. If you don’t, if you don’t move forward when you have the opportunity to, they won’t, they’ll frown on that. Right? But then you have to balance the technical side. So, so looking at the project as a whole, you’ve got 62 turbines, 44 or fully operational. So that leaves us with 18 that are not. Of those 18, you said Alan? 10 needed blades.  Speaker 2: 10 need blades, and one still needs to be erected.  Joel Saxum: Okay, so what’s the other seven?  Speaker 2: They’re partially installed, so they, they haven’t completed the turbine, so everything’s put together, but they haven’t powered them up yet.  Joel Saxum: I was told that. Basically with the kit that they have out of vineyard wind, that they can do one turbine a day blades. Speaker 2: That would be, yeah, that would make sense to me.  Joel Saxum: But, but you also have to, you have 45 days of vessel time left. You said they’re gonna leave in March, but you also gotta think it’s fricking winter in. The, [00:18:00] in the Atlantic  Speaker 2: they are using jackass. However, there’s big snow storms and, and low uh, pressure storms that are rolling through just that area. ’cause they, they’ve kind of come to the Midwest and then shoot up the east coast. That’s where you see New York City with a lot of snow. Boston had a lot of snow just recently. They’re supposed to get another storm like that. And then once it hits Boston, it kind of hits the water, which is where vineyard is. So turbulent water for sure. Super cold this time of year out there,  Joel Saxum: but wind, you can’t sling blades in, in probably more than what, six meters per second’s? Probably your cutoff.  Speaker 2: Yeah. This is not the best time of year to be putting blade sets up offshore us.  Joel Saxum: Technically, if you had blue skies, yeah, this thing can get done and we can move. But with weather risk added in you, you’ve got, there’s some wild cards there.  Speaker 2: I It’s gonna be close.  Joel Saxum: Yeah. If we looked at the, the weather, it looks like even, I think this coming weekend now we’re recording in January here, and [00:19:00] this weekend’s, first week in February coming, there’s supposed to be another storm rolling up through there too. Speaker 2: It was pretty typical having lived in Massachusetts almost 25 years. It will be stormy until April. So we’re talking about the time span of which GE and Vineyard want to be done. That’s a rough period for snow. And as historically, uh, that timeframe is also when nor’easters happened, where the storms just sit there and cyclone off the shore around vineyard and then dump the snow back on land. Those storms are really violent and there’s no way they’re gonna be hanging. Anything out in the water, so I think it’s gonna be close. They’re gonna have to hope for good weather. Don’t let blade damage catch you off guard. OGs, ping sensors detect issues before they become expensive, time consuming problems from ice buildup and lightning strikes to pitch misalignment and internal blade cracks. OGs Ping has you covered The cutting edge sensors are easy to install, giving you [00:20:00] the power to stop damage before it’s too late. Visit eLog ping.com and take control of your turbine’s health today. So while GE Ver Nova celebrated strong results in its Q4 report, in both its energy and electrification business, the company’s wind division told a different story. In the fourth quarter of 2025, wind revenue fell 24% to $2.37 billion. Uh, driven primarily by offshore wind struggles, vineyard, wind, uh. The company recorded approximately $600 million in win losses for the full year up from earlier expectations of about $400 million. That’s what I remember from last summer. Uh, the, the culprit was. All vineyard wind, they gotta get this project done. And with this work stoppages, it just keeps dragging it on and on and on. And I know GE has really wanted to wrap that up as [00:21:00] fast as they can. Uh, CEO Scott Straza has said the company delivered strong financial results, which they clearly have because they’re gas turbine business is taking orders out to roughly 2035, and I think the number on the back order was gonna be somewhere in the realm of 150 billion. Dollars, which is an astronomical number for back orders. And because they had the back orders that far out, they’re raising prices which improves margins, which makes everybody on the stock market happy. You would think, Joel? Except after the, the Q4 results today, GE Renovo stock is really flat,  Joel Saxum: which is an odd thing, right? I talk about it all the time. Um, I’m always thinking they’re gonna drop and they go up and they go up and they go up. But today was just kind of like a, I don’t know how to take it. Yeah. And I don’t know if it’s a, a broader sentiment across what the market was doing today because there was some other tech earnings and things of that sort, but it’s always something to watch, right? So. Uh, there, [00:22:00] there’s some interesting stuff going on on in the GE world, but one thing I want to touch on here, we’re talking like vineyard wind caused them this, these delays right there is a, a, a larger call to understand why there was these delays and because it’s causing. Havoc across the industry. Right. But even the, like, a lot of like, uh, conservative lawmakers, like there were some senators and stuff coming out saying like, we need more transparency to understand these 90 day halts because of what it’s doing to the industry, right? Because to date there hasn’t been really any explanation and the judges have been just kind of throwing ’em out. Um, but you can see what it’s done here to ge. Recording $600 million in win losses. I mean, and that is mostly all vineyard wind, right? But there’s a little bit of Dogger bank stuff in there. I would imagine  Speaker 2: a tiny bit. Really? ’cause Dogger has been a lot less stressful to ge.  Joel Saxum: But it is, yeah. The, the uncertainty of the market. And that’s why we kind of said a little bit, I said a little bit ago, like when this thing is done, when Vineyard [00:23:00] Point is like, and when you can put the final nail in the coffin of construction on that, it is gonna be agh sigh of relief over at GEs offices For sure.  Speaker 2: Our friend Alina, Hal Stern appeared in Energy Watch this week and she’s spent a long time in the wind industry. She’s been in it 25 years, and, uh, she commented that she’s seeing some troubling things. Uh, she’s also the new CEO of Wind Power Lab over in Denmark, and they’re a consultancy firm on wind turbines and particularly blades. Uh, Lena says that she’s watched some. Really significant manufacturing errors in operational defects and wind turbine blades become more frequent. And in 2025 alone, Windpower lab analyzed and provided repair recommendations for over 700 blades globally. And I assume, or Blade Whisperer Morton Hamburg was involved in a number of those. Uh, the problem she says is that the market eagerly, uh, [00:24:00] demanded cheap turbines, which is true. And, uh. Everything had to be done faster and with lower costs, and you end up with a product that reflects that. Uh, we’ve had Lena on a podcast a couple of times, super smart. Uh, she’s great to talk to, get offline and understand what’s happening behind the scenes. And, uh, in some of these conference rooms between asset managers, operators, and OEMs, those are sometimes tough. Discussions, but I, I think Lena’s pointing out something that I, the industry has been trying to deal with and she’s raising it up sort of to a higher level because she has that weight to do that. We have some issues with blades that we need to figure out pretty quickly. And Yolanda, you ran, uh, a large, uh, operator in the United States. We’re dealing with more than a thousand turbines. How locked in is Lena, uh, to [00:25:00]some of these issues? And are they purely driven just by the push to lower the cost of the blades or was it more of a speed issue that they making a longer blades in the same amount of time? Where’s that balance and, and what are we going to do about it going forward as we continue to make larger turbines?  Yolanda Padron: She’s great with, with her point, and I think it’s. A little bit about the, or equally about the OEMs maybe not being aware of these issues as much, or not having the, the bandwidth to take care of these issues with limited staff and just a lot of the people who are charge of developing and constructing these projects at a very short amount of time, or at least with having to wear so many hats that they. Don’t necessarily have the, the bandwidth to do a deep dive on what the potential risks could be in [00:26:00] operations. And so I think the way I’ve, I’ve seen it, I’ve experienced it. It’s almost like everybody’s running a marathon. Their shoe laces untied, so they trip and then they just kind of keep on running ’cause you’re behind, ’cause you tripped. And so it just keeps on, it’s, it’s, it’s a vicious cycle. Um. But, uh, we’ve also seen just, just in our time together and everything, that there’s a lot of people that are noticing this and that are taking the time to just pause, you know, tie those releases and just talk to each other a little bit more of, Hey, I’m the one engineer doing this for so many turbines. You have these turbines too. Are you seeing this issue? Yes. No. Are, how are you tackling it? How have you tackled it in the past? How can we work together to, to use the data we have? Right? That, I mean, if you’re not going to get a really great answer from your OEMs or if you’re not going to get a lot of [00:27:00] easily available answers just from the dataset that you’re seeing from your turbine, it’s really easy now to to reach out to other people within the industry and to be able to talk it over, which I think is something that Lena. Is definitely encouraging here.  Joel Saxum: Yeah. Yeah. It’s, I mean, she, she makes a statement about owners needing to be technically mature, ensure you have inspections, get your TSAs right. So these are, again, it’s lessons learned. It’s sharing knowledge within the market because at the end of the day, this is a new, not a new reality. This is the reality we’re living in. Right. It’s not new. Um, but, but we’re getting better at it. I think that’s the, the important thing here, right? From a, from a. If we take a, the collective group of operators in the world and say like, you know, where were you two, three years ago and where are you today? I think we’re in a much better place, and that’s from knowledge sharing and, and understanding these issues. And, you know, we’re, we’re at the behest of, uh, good, fast, cheap pick. [00:28:00] Right. And so that’s got us where we are today. But now we’re, we’re starting to get best practices, lessons learned, fix things for the next go around. And you’re seeing efforts at the OEM level as well to, uh, and some, some of these consultants coming out, um, to, to try to fix some of these manufacturing issues. You know, Alan, you and I have talked with DFS composites with Gulf Wind Technology. Like there, there’s things here that we could possibly fix. You’re starting to see operators do. Internal inspections to the blades on the ground before they fly them. That’s huge. Right? That’s been the Wind Power lab has been talking about that since 2021. Right. But the message is finally getting out to the industry of this is what you should be doing as a best practice to, you know, de-risk. ’cause that’s the whole thing. You de-risk, de-risk, de-risk. Uh, so I think. Lena’s spot on, right? We know that this, these things are happening. We’re working with the OEMs to do them, but it takes them a technically mature operator. And if you’re, if you don’t have the staff to be technically mature, go grab a consultant, [00:29:00] go grab someone that is to help you out. I think that’s a, that’s an important, uh, thing to take from this as well. Those people are out there, those groups are out there, so go and go in, enlist that to make sure you’re de-risking this thing, because at the end of the day, if we’re de-risking turbines. It’s better for the whole industry.  Speaker 2: Yeah. You want to grab somebody that has seen a lot of blades, not a sole consultant on a particular turbine mine. You’re talking about at this point in the development of the wind industry, you’re talking about wind power labs, sky specs kind of companies that have seen thousands of turbines and have a broad reach where they’ve done things globally, just not in Scandinavia or the US or Australia or somewhere else. They’ve, they’ve seen problems worldwide. Those people exist, and I, I don’t think we as an industry use them as much as we could, but it would get to the solutions faster because having seen so many global [00:30:00] issues with the St turbine, the solution set does vary depending on where you are. But it’s been proven out already. So even though you as an asset manager. May have never heard of this technique to make your performance better. You make your blades last longer. It’s probably been done at this point, unless it’s a brand new turbine. So a lot of the two x machines and three X machines, and now we’re talking about six X machines. There’s answers out there, but you’re gonna have to reach out to somebody who has a global reach. We’ve grown too big to do it small anymore,  Yolanda Padron: which really should be a relief to. All of the asset managers and operations people and everything out there, right? Like. You don’t have to use your turbines as Guinea pigs anymore. You don’t have to struggle with this.  Speaker 2: That wraps up another episode of the Uptime Wind Energy Podcast, and if today’s discussion sparked any questions or ideas, we’d love to hear from you. Reach out to us on LinkedIn and don’t forget to subscribe so you never miss an episode. [00:31:00] And if you found value in today’s conversation, please leave us a review. It really helps other wind energy professionals discover the show for Rosie, Yolanda and Joel. I am Alan Hall, and we’ll see you here next week on the Uptime Wind Energy Podcast.

Desde el reloj
Dockge y cómo organizo mis contenedores Docker

Desde el reloj

Play Episode Listen Later Feb 2, 2026 14:48


Aprovechando el cambio de NAS, he modificado la forma en cómo levanto los contenedores Docker de los distintos servicios auto-alojados. Ya no lo tengo todo en un único archivo docker-compose.yaml, porque ahora uso Dockge para organizarlo todo mucho mejor. Te cuento para qué vale esta herramienta y cómo lo tengo yo todo dividido ahora.

Programming By Stealth
PBS Tidbit 17b — Simplifying Developer Setups with Docker

Programming By Stealth

Play Episode Listen Later Feb 1, 2026 82:19


This instalment is the second half of PBS Tidbit 17 in which Helma van der Linden is the instructor and Bart Busschots is the student. We pick up the plot right where Helma begins to teach how to reuse the Docker image created in the first half of the lesson. You can find Helma's fabulous tutorial shownotes and the audio podcast at pbs.bartificer.net Join the Conversation: allison@podfeet.com podfeet.com/slack Support the Show: Patreon Donation Apple Pay or Credit Card one-time donation PayPal one-time donation Podfeet Podcasts Mugs at Zazzle NosillaCast 20th Anniversary Shirts Referral Links: Setapp - 1 month free for you and me PETLIBRO - 30% off for you and me Parallels Toolbox - 3 months free for you and me Learn through MacSparky Field Guides - 15% off for you and me Backblaze - One free month for me and you Eufy - $40 for me if you spend $200. Sadly nothing in it for you. PIA VPN - One month added to Paid Accounts for both of us CleanShot X - Earns me $25%, sorry nothing in it for you but my gratitude

DevZen Podcast
Здесь так принято — Episode 526

DevZen Podcast

Play Episode Listen Later Jan 29, 2026 104:09


В этом выпуске: пьем кофе с таурином, ищем side-channel уязвимости в Docker, заменяем Rust на Python, пристаем с глупыми вопросами к Gemini, а также обсуждаем темы слушателей. [00:00:00] Чему мы научились за неделю [00:19:48] Кофе и нервозность/раздрожительность [00:42:47] Fun-reliable side-channels for cross-container communication [00:57:03] Как Rust проиграл по скорости Python https://eax.me/2026/2026-01-23-rust-vs-python.html https://www.reddit.com/r/Python/comments/1dv811q/flpc_probably_the_fastest_regex_library_for/ https://docs.rs/plotters/latest/plotters/ https://github.com/pola-rs/polars [01:11:08]… Читать далее →

Atareao con Linux
ATA 766 Adiós a Docker Compose. Cómo usar PODS en Podman. Paso a Paso

Atareao con Linux

Play Episode Listen Later Jan 29, 2026 27:55


¿Sigues usando Docker Compose para todo? Es hora de descubrir la verdadera potencia de Podman: los Pods. En este episodio te acompaño en la migración de un stack completo de WordPress, MariaDB y Redis para que veas cómo simplificar radicalmente la gestión de tus contenedores.Aprenderás por qué el concepto de "vaina" (Pod) cambia las reglas del juego al permitir que tus contenedores compartan la misma red y dirección IP, atacando directamente a localhost. Veremos desde el funcionamiento técnico del contenedor Infra hasta la automatización profesional con Quadlet y systemd.¿Qué es un Pod?: El origen del nombre y por qué es la unidad lógica ideal para tus servicios.Adiós a los problemas de red: Cómo conectar WordPress y base de datos sin crear redes virtuales, usando simplemente 127.0.0.1.Seguridad y Sidecars: Blindar servicios como Redis dentro de la misma vaina para que sean inaccesibles desde el exterior.Gestión unificada: Cómo detener, arrancar y monitorizar todo tu stack con un solo comando.Persistencia y automatización: Generar archivos YAML de Kubernetes y convertirlos en servicios nativos de Linux con archivos .kube.Si buscas soluciones prácticas para "cualquier cosa que quieras hacer con Linux", este episodio te da las herramientas para profesionalizar tu infraestructura.Notas completas y comandos utilizados: https://atareao.es/podcast/766

Programming By Stealth
PBS Tidbit 17a — Simplifying Developer Setups with Docker

Programming By Stealth

Play Episode Listen Later Jan 28, 2026 100:23


This very special episode of Programming By Stealth is a Tidbit written and taught by the lovely Helma van der Linden. Bart has wanted to understand Docker better, and Helma has some great use cases for how to use them for developer setups so it was a good opportunity for Bart to learn from Helma. The material is quite long, so the podcast was recorded in two segments, Tidbit 17a and b. Tidbit b will be along shortly, and picks up and the heading entitled "Reusing the Docker image". You can find Helma's fabulous tutorial shownotes and the audio podcast at pbs.bartificer.net Join the Conversation: allison@podfeet.com podfeet.com/slack Support the Show: Patreon Donation Apple Pay or Credit Card one-time donation PayPal one-time donation Podfeet Podcasts Mugs at Zazzle NosillaCast 20th Anniversary Shirts Referral Links: Setapp - 1 month free for you and me PETLIBRO - 30% off for you and me Parallels Toolbox - 3 months free for you and me Learn through MacSparky Field Guides - 15% off for you and me Backblaze - One free month for me and you Eufy - $40 for me if you spend $200. Sadly nothing in it for you. PIA VPN - One month added to Paid Accounts for both of us CleanShot X - Earns me $25%, sorry nothing in it for you but my gratitude

The Cloud Pod
340: Azure releases a new SQL AI Assistant… Jimmy Droptables

The Cloud Pod

Play Episode Listen Later Jan 27, 2026 73:07


Welcome to episode 340 of The Cloud Pod, where the forecast is always cloudy! It's a full house (eventually) with Justin, Jonathan, Ryan, and Matt all on board for today's episode. We've got a lot of announcements, from Gemini for Gov (no more CamoGPT!) to Route 52 and Claude. Let's get started!  Titles we almost went with this week Claude’s Pricing Tiers: Free, Pro, and Maximum Overdrive GitHub Copilot Learns Database Schema: Finally an AI That Understands Your Joins SSMS Gets a Copilot: Your T-SQL Now Writes Itself While You Grab Coffee Too Many Cooks in the Cloud Kitchen: How 32 GPUs Outcooked the Big Tech Industrial Kitchens Uncle Sam Gets a Gemini Twin: Google’s AI Goes Federal Route 53 Gets Domain of Its Own: .ai Joins the Party Thai One On: Google Cloud Plants Its Flag in Bangkok NAT So Fast: Azure’s Gateway Gets a V2 Glow-Up Beware Azure's SQL Assistant doesn't smoke your joints. AI Is Going Great, Or How ML Makes Money   30:10 Announcing BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing | Databricks Blog Databricks released BlackIce, an open-source containerized toolkit that bundles 14 AI security testing tools into a single Docker image available on Docker Hub as databricksruntime/blackice:17.3-LTS.  The toolkit addresses common red teaming challenges, including conflicting dependencies, complex setup requirements, and the fragmented landscape of AI security tools, by providing a unified command-line interface similar to how Kali Linux works for traditional penetration testing. The toolkit includes tools covering three main categories: Responsible AI, Security testing, and classical adversarial ML, with capabilities mapped to MITRE ATLAS and the Databricks AI Security Framework.  Tools are organized as either static (simple CLI-based with minimal programming needed) or dynamic (Python-based with customization options), with static tools isolated in separate virtual environments and dynamic tools in a global environment with managed dependencies. BlackIce integrates directly with Databricks Model Serving endpoints through custom patches applied to several tools, allowing security teams to test for vulnerabilities like prompt injections, data leakage, hallucination detection, jailbreak attacks, and supply chain security issues.  Users can deploy it via Databricks Container Services by specifying the Docker image URL when creating compute clusters. The release includes a demo notebook showing how to orchestrate multiple security tools in a single environment, with all build artifacts, tool documentation, and examples available in the GitHub repository.  The CAMLIS Red Paper provides additional technical details on tool selection criteria and the Docker image architecture. 04:30 Ryan – “It's very difficult to feel confident in your AI security practice or patterns. I feel like it's just bleeding edge, and I

LINUX Unplugged
651: Uptime Funk

LINUX Unplugged

Play Episode Listen Later Jan 26, 2026 62:43 Transcription Available


When your self-hosted services become infrastructure, breakage matters. We tackle monitoring that actually helps, alerts you won't ignore, and DNS for local, and multi-mesh network setups.Sponsored By:Jupiter Party Annual Membership: Put your support on automatic with our annual plan, and get one month of membership for free! Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love. Support LINUX UnpluggedLinks:

Atareao con Linux
ATA 765 Adiós a PASS y GPG. Por qué me pasé a Age y SOPS

Atareao con Linux

Play Episode Listen Later Jan 26, 2026 23:00


Software Defined Talk
Episode 556: This Conversation is Hardened

Software Defined Talk

Play Episode Listen Later Jan 23, 2026 67:35


This week, we discuss the end of Cloud 1.0, AI agents fixing old apps, and Chainguard vs. Docker images. Plus, the mystery of Dutch broth is finally solved. Watch the YouTube Live Recording of Episode 556 Runner-up Titles His overall deal Been there and done that been ignoring that shift key for years Cloud is just fine I'll be back in Bartertown The “F” Word Hardened-washing We'll never do this, but we should check back in in 3 months Libraries are the best Elves don't belong in space Rundown Are we at the end of cloud or cloud 1.0 It's the beginning of Cloud 2.0 Spec-driven development system for Claude Code Anthropic and App Modernization A meta-prompting, context engineering and spec-driven development What comes next, if Claude Code is as good as people say. Microsoft Spending on Anthropic Approaches $500 Million a Year Claude Code Won't Fix Your Life Coté and Tony contemplate day two AI-generated apps, and an excerpt. Why We've Tried to Replace Developers Every Decade Since 1969 Well, that escalated quickly: Zero CVEs, lots of vendors Relevant to your Interests Beijing tells Chinese firms to stop using US and Israeli cybersecurity software China blacklists VMware, Palo Alto Networks software over national security fears Kroger taps Google Gemini, announces more key AI moves Texas judge throws out second lawsuit over CrowdStrike outage Apple will pay billions for Gemini after OpenAI declined Dell wants £10m+ from VMware if Tesco case goes against it Tailscale: The Best Free App Most Mac Power Users Aren't Using How WhatsApp Took Over the Global Conversation Our approach to advertising and expanding access to ChatGPT OpenAI's ARR reached over $20 billion in 2025, CFO says Simon Willison's take on Our approach to advertising and ChatGPT The AI lab revolving door spins ever faster | TechCrunch How Markdown took over the world An Interview with United CEO Scott Kirby About Tech Transformation Conferences cfgmgmtcamp 2026, February 2nd to 4th, Ghent, BE. Coté speaking - anyone interested in being an SDI guest? DevOpsDayLA at SCALE23x, March 6th, Pasadena, CA Use code: DEVOP for 50% off. Devnexus 2026, March 4th to 6th, Atlanta, GA. Use this 30% off discount code from your pals at Tanzu: DN26VMWARE30. KubeCon EU, March 23rd to 26th, 2026 - Coté will be there on a media pass. VMware User Groups (VMUGs): Amsterdam (March 17-19, 2026) Minneapolis (April 7-9, 2026) Toronto (May 12-14, 2026) Dallas (June 9-11, 2026) Orlando (October 20-22, 2026) SDT News & Community Join our Slack community Email the show: questions@softwaredefinedtalk.com Free stickers: Email your address to stickers@softwaredefinedtalk.com Follow us on social media: Twitter, Threads, Mastodon, LinkedIn, BlueSky Watch us on: Twitch, YouTube, Instagram, TikTok Book offer: Use code SDT for $20 off "Digital WTF" by Coté Sponsor the show Recommendations Brandon: The Library will loan you a 5G hotspot Matt: Deep Rock Galactic: Survivor (rogue-like Vampire Hunters-type game) Coté: Streamyard shorts generation. Salesforce was inspired by dolphins.

Hanselminutes - Fresh Talk and Tech for Developers
Run your AI Agent in a Sandbox, with Docker President Mark Cavage

Hanselminutes - Fresh Talk and Tech for Developers

Play Episode Listen Later Jan 22, 2026 32:13


Sandboxing is having a moment. As agents move from chat windows into terminals, repos, and production-adjacent workflows, the question is no longer “What can AI generate?” but “Where can it safely run?” In this episode, Scott talks with Mark Cavage, President of Docker, about the resurgence of sandboxes as critical infrastructure for the agent era and the thinking behind Docker's newly released sandbox feature.They explore why isolation, reproducibility, and least-privilege execution are becoming table stakes for AI-assisted development. From protecting local machines to enabling trustworthy automation loops, Scott and Mark dig into how modern sandboxes differ from traditional containers, what developers should expect from secure agent runtimes, and why the future of “AI that does things” will depend as much on boundaries as it does on model capability.

Tank Talks
Building a Solo GP Fund with Timothy Chen of Essence VC

Tank Talks

Play Episode Listen Later Jan 22, 2026 64:42


In this episode of Tank Talks, Matt Cohen sits down with Timothy Chen, the sole General Partner at Essence VC. Tim shares his remarkable journey from being a “nerdy, geeky kid” who hacked open-source projects to becoming one of the most respected early-stage infrastructure investors, backing breakout companies like Tabular (acquired by Databricks for $2.2 billion). A former engineer at Microsoft and VMware, co-founder of Hyperpilot (acquired by Cloudera), and now a solo GP who quietly raised over $41 million for his latest fund, Tim offers a unique, no-BS perspective on spotting technical founders, navigating the idea maze, and rethinking sales and traction in the world of AI and infrastructure.We dive deep into his unconventional path into VC, rejected by traditional Sand Hill Road firms, only to build a powerhouse reputation through sheer technical credibility and founder empathy. Tim reveals the patterns behind disruptive infra companies, why most VCs can't help with product-market fit, and how he leverages his engineering background to win competitive deals.Whether you're a founder building the next foundational layer or an investor trying to understand the infra and AI boom, this conversation is packed with hard-won insights.The Open Source Resume (00:03:44)* How contributing to Apache projects (Drill, Cloud Foundry) built his career when a CS degree couldn't.* The moment he realized open source was a path to industry influence, not just a hobby.* Why the open source model is more “vertical than horizontal”, allowing deep contribution without corporate red tape.From Engineer to Founder: The Hyperpilot Journey (00:13:24)* Leaving Docker to start Hyperpilot and raising seed funding from NEA and Bessemer.* The harsh reality of founder responsibility: “It's not about the effort hard, it's about all the other things that has to go right.”* Learning from being “way too early to market” and the acquisition by Cloudera.The Unlikely Path into Venture Capital (00:26:07)* Rejected by top-tier VC firms for a job, then prompted to start his own fund via AngelList.* Starting with a $1M “Tim Chen Angel Fund” focused solely on infrastructure.* How Bain Capital's small anchor investment gave him the initial credibility.Building a Brand Through Focus & Reputation (00:30:42)* Why focusing exclusively on infrastructure was his “best blessing” creating a standout identity in a sparse field.* The reputation flywheel: Founders praising his help led to introductions from top-tier GPs and LPs.* StepStone reaching out for a commitment before he even had fund documents ready.The Essence VC Investment Philosophy (00:44:34)* Pattern Recognition: What he learned from witnessing the early days of Confluent, Databricks, and Docker.* Seeking Disruptors, Not Incrementalists: Backing founders who have a “non-common belief” that leads to a 10x better product (e.g., Modal Labs, Cursor, Warp).* Rethinking Sales & Traction: Why revenue-first playbooks don't apply in early-stage infra; comfort comes from technical co-building and roadmap planning.* The “Superpower”: Using his engineering background to pressure-test technical assumptions and timelines with founders.The Future of Infra & AI (00:52:09)* Infrastructure as an “enabler” for new application paradigms (real-time video, multimodal apps).* The coming democratization of building complex systems (the “next Netflix” built by smaller teams).* The shift from generalist backend engineers to specialists, enabled by new stacks and AI.Solo GP Life & Staying Relevant (00:54:55)* Why being a solo GP doesn't mean being a lone wolf; 20-30% of his time is spent syncing with other investors to learn.* The importance of continuous learning and adaptation in a fast-moving tech landscape.* His toolkit: Using portfolio company Clerky (a CRM) to manage workflow.About Timothy ChenFounder and Sole General Partner, Essence VCTimothy Chen is the Sole General Partner at Essence VC, a fund focused on early-stage infrastructure, AI, and open-source innovation. A three-time founder with an exit, his journey from Microsoft engineer to sought-after investor is a masterclass in building credibility through technical depth and founder-centric support. He has backed companies like Tabular, Iteratively, and Warp, and his insights are shaped by hundreds of conversations at the bleeding edge of infrastructure.Connect with Timothy Chen on LinkedIn: linkedin.com/in/timchenVisit the Essence VC Website: https://www.essencevc.fund/Connect with Matt Cohen on LinkedIn: https://ca.linkedin.com/in/matt-cohen1Visit the Ripple Ventures website: https://www.rippleventures.com/ This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit tanktalks.substack.com

Building Abundant Success!!© with Sabrina-Marie
Episode 2661: Peter Docker ~ NBC Universal, Frm. Royal Air Force Senior Officer talks Mentoring & Leadership by "Leading From the Jumpseat"

Building Abundant Success!!© with Sabrina-Marie

Play Episode Listen Later Jan 14, 2026 33:17


Accenture, American Express, ASOS, EY, Four Seasons Hotels, Google, NBC Universal  are his clientsFrm Royal Air Force Senior Officer, Frm. International Negotiator for the UK Government,  executive coach. Google, Accenture, American Express His first book, 'Find Your Why: A Practical Guide for Discovering Purpose for You and Your Team', co-authored with Simon Sinek and David Mead. Peter gets up every day inspired to enable people to be extraordinary so that they can do extraordinary things. Collaborating with Simon Sinek for over 7 years, he was a founding Igniter and Implementation Specialist on the Start With Why team, teaching leaders and companies how to use the concept of Why."The first step is to distinguish leadership from management. “Management is about handling complexity,” explains Docker, while “leadership is about creating simplicity. It's about cutting through the noise, identifying what's really important, making it personal for people, bringing them together and connecting them.”  ~  Peter Docker in Venteur Magazine January 2023One of Peter's latest books, 'Leading from The Jumpseat: How to Create Extraordinary Opportunities by Handing Over Control'Peter's commercial and industry experience has been at the most senior levels in sectors including oil & gas, construction, mining, pharmaceuticals, banking, television, film, media, manufacturing and services - across more than 90 countries. His career has spanned professional pilot; leading an aviation training and standards organisation; teaching post-graduates at an international college; and running multi-billion dollar procurement projects. A former Royal Air Force senior officer, he has been a Force Commander during combat flying operations and has seen service across the world. He is a seasoned crisis manager, a former international negotiator for the UK Government and executive coach.© 2026 Building Abundant Success!!2026 All Rights ReservedJoin Me on ~ iHeart Media @ https://tinyurl.com/iHeartBASSpot Me on Spotify: https://tinyurl.com/yxuy23bAmazon Music ~ https://tinyurl.com/AmzBASAudacy:  https://tinyurl.com/BASAud

Software Defined Talk
Episode 554: The Alpha and The Omega

Software Defined Talk

Play Episode Listen Later Jan 9, 2026 72:05


This week, we discuss AI's impact on Stack Overflow, Docker's Hardened Images, and Nvidia buying Groq. Plus, thoughts on playing your own game and having fun. Watch the YouTube Live Recording of Episode (https://www.youtube.com/live/LQSxLbjvz3c?si=ao8f3hwxlCrmH1vX) 554 (https://www.youtube.com/live/LQSxLbjvz3c?si=ao8f3hwxlCrmH1vX) Please complete the Software Defined Talk Listener Survey! (https://docs.google.com/forms/d/e/1FAIpQLSfl7eHWQJwu2tBLa-FjZqHG2nr6p_Z3zQI3Pp1EyNWQ8Fu-SA/viewform?usp=header) Runner-up Titles It's all brisket after that. Exploring Fun Should I go build a snow man? Pets Innersourcing Two books Michael Lewis should write. Article IV is foundational. Freedom is options. Rundown Stack Overflow is dead. (https://x.com/rohanpaul_ai/status/2008007012920209674?s=20) Hardened Images for Everyone (https://www.docker.com/blog/docker-hardened-images-for-every-developer/) Tanzu's Bitnami stuff does this too (https://blogs.vmware.com/tanzu/what-good-software-supply-chain-security-looks-like-for-highly-regulated-industries/). OpenAI OpenAI's New Fundraising Round Could Value Startup at as Much as $830 Billion (https://www.wsj.com/tech/ai/openais-new-fundraising-round-could-value-startup-at-a[…]4238&segment_id=212500&user_id=c5a514ba8b7d9a954711959a6031a3fa) OpenAI Reportedly Planning to Make ChatGPT "Prioritize" Advertisers in Conversation (https://futurism.com/artificial-intelligence/openai-chatgpt-sponsored-ads) OpenAI bets big on audio as Silicon Valley declares war on screens (https://techcrunch.com/2026/01/01/openai-bets-big-on-audio-as-silicon-valley-declares-war-on-screens/) Sam Altman says: He has zero percent interest in remaining OpenAI CEO, once (https://timesofindia.indiatimes.com/technology/tech-news/sam-altman-says-he-has-zero-percent-interest-remaining-openai-ceo-once-/articleshow/126350602.cms) Nvidia buying AI chip startup Groq's assets for about $20 billion in its largest deal on record (https://www.cnbc.com/2025/12/24/nvidia-buying-ai-chip-startup-groq-for-about-20-billion-biggest-deal.html) Relevant to your Interests Broadcom IT uses Tanzu Platform to host MCP Servers (https://news.broadcom.com/app-dev/broadcom-tanzu-platform-agentic-business-transformation). A Brief History Of The Spreadsheet (https://hackaday.com/2025/12/15/a-brief-history-of-the-spreadsheet/) Databricks is raising over $4 billion in Series L funding at a $134 billion (https://x.com/exec_sum/status/2000971604449485132?s=20) Amazon's big AGI reorg decoded by Corey Quinn (https://www.theregister.com/2025/12/17/jassy_taps_peter_desantis_to_run_agi/) “They burned millions but got nothing.” (https://automaton-media.com/en/news/japanese-game-font-services-aggressive-price-hike-could-be-result-of-parent-companys-alleged-ai-failu/) X sues to protect Twitter brand Musk has been trying to kill (https://www.theregister.com/2025/12/17/x_twitter_brand_lawsuit/) Mozilla's new CEO says AI is coming to Firefox, but will remain a choice | TechCrunch (https://techcrunch.com/2025/12/17/mozillas-new-ceo-says-ai-is-coming-to-firefox-but-will-remain-a-choice/) Why Oracle keeps sparking AI-bubble fears (https://www.axios.com/2025/12/18/ai-oracle-stock-blue-owl) What's next for Threads (https://sources.news/p/whats-next-for-threads) Salesforce Executives Say Trust in Large Language Models Has Declined (https://www.theinformation.com/articles/salesforce-executives-say-trust-generative-ai-declined?rc=giqjaz) Akamai Technologies Announces Acquisition of Function-as-a-Service Company Fermyon (https://www.akamai.com/newsroom/press-release/akamai-announces-acquisition-of-function-as-a-service-company-fermyon) Google Rolling Out Gmail Address Change Feature: Here Is How It Works (https://finance.yahoo.com/news/google-rolling-gmail-address-change-033112607.html) The Enshittifinancial Crisis (https://www.wheresyoured.at/the-enshittifinancial-crisis/) MongoBleed: Critical MongoDB Vulnerability CVE-2025-14847 | Wiz Blog (https://www.wiz.io/blog/mongobleed-cve-2025-14847-exploited-in-the-wild-mongodb) Softbank to buy data center firm DigitalBridge for $4 billion in AI push (https://www.cnbc.com/amp/2025/12/29/digitalbridge-shares-jump-on-report-softbank-in-talks-to-acquire-firm.html) The best tech announced at CES 2026 so far (https://www.theverge.com/tech/854159/ces-2026-best-tech-gadgets-smartphones-appliances-robots-tvs-ai-smart-home) Who's who at X, the deepfake porn site formerly known as Twitter (https://www.ft.com/content/ad94db4c-95a0-4c65-bd8d-3b43e1251091?accessToken=zwAGR7kzep9gkdOtlNtMlaBMZdO9jTtD4SUQkQ.MEYCIQCdZajuC9uga-d9b5Z1t0HI2BIcnkVoq98loextLRpCTgIhAPL3rW72aTHBNL_lS7s1ONpM2vBgNlBNHDBeGbHkPkZj&sharetype=gift&token=a7473827-0799-4064-9008-bf22b3c99711) Manus Joins Meta for Next Era of Innovation (https://manus.im/blog/manus-joins-meta-for-next-era-of-innovation) The WELL: State of the World 2026 with Bruce Sterling and Jon Lebkowsky (https://people.well.com/conf/inkwell.vue/topics/561/State-of-the-World-2026-with-Bru-page01.html) Virtual machines still run the world (https://cote.io/2026/01/07/virtual-machines-still-run-the.html) Databases in 2025: A Year in Review (https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-retrospective.html) Chat Platform Discord Files Confidentially for IPO (https://www.bloomberg.com/news/articles/2026-01-06/chat-platform-discord-is-said-to-file-confidentially-for-ipo?embedded-checkout=true) The DRAM shortage explained: AI, rising prices, and what's next (https://www.techradar.com/pro/why-is-ram-so-expensive-right-now-its-more-complicated-than-you-think) Nonsense Palantir CEO buys monastery in Old Snowmass for $120 million (https://www.denverpost.com/2025/12/17/palantir-alex-karp-snowmass-monastery/amp/) H-E-B gives free groceries to all customers after registers glitch today in Burleson, Texas. (https://www.reddit.com/r/interestingasfuck/s/ZEcblg7atP) Conferences cfgmgmtcamp 2026 (https://cfgmgmtcamp.org/ghent2026/), February 2nd to 4th, Ghent, BE. Coté speaking - anyone interested in being a SDI guest? DevOpsDayLA at SCALE23x (https://www.socallinuxexpo.org/scale/23x), March 6th, Pasadena, CA Use code: DEVOP for 50% off. Devnexus 2026 (https://devnexus.com), March 4th to 6th, Atlanta, GA. Coté has a discount code, but he's not sure if he can give it out. He's asking! Send him a DM in the meantime. KubeCon EU, March 23rd to 26th, 2026 - Coté will be there on a media pass. Whole bunch of VMUGs, mostly in the US. The CFPs are open (https://app.sessionboard.com/submit/vmug-call-for-content-2026/ae1c7013-8b85-427c-9c21-7d35f8701bbe?utm_campaign=5766542-VMUG%20Voice&utm_medium=email&_hsenc=p2ANqtz-_YREN7dr6p3KSQPYkFSN5K85A-pIVYZ03ZhKZOV0O3t3h0XHdDHethhx5O8gBFguyT5mZ3n3q-ZnPKvjllFXYfWV3thg&_hsmi=393690000&utm_content=393685389&utm_source=hs_email), go speak at them! Coté speaking in Amsterdam. Amsterdam (March 17-19, 2026), Minneapolis (April 7-9, 2026), Toronto (May 12-14, 2026), Dallas (June 9-11, 2026), Orlando (October 20-22, 2026) SDT News & Community Join our Slack community (https://softwaredefinedtalk.slack.com/join/shared_invite/zt-1hn55iv5d-UTfN7mVX1D9D5ExRt3ZJYQ#/shared-invite/email) Email the show: questions@softwaredefinedtalk.com (mailto:questions@softwaredefinedtalk.com) Free stickers: Email your address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) Follow us on social media: Twitter (https://twitter.com/softwaredeftalk), Threads (https://www.threads.net/@softwaredefinedtalk), Mastodon (https://hachyderm.io/@softwaredefinedtalk), LinkedIn (https://www.linkedin.com/company/software-defined-talk/), BlueSky (https://bsky.app/profile/softwaredefinedtalk.com) Watch us on: Twitch (https://www.twitch.tv/sdtpodcast), YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured), Instagram (https://www.instagram.com/softwaredefinedtalk/), TikTok (https://www.tiktok.com/@softwaredefinedtalk) Book offer: Use code SDT for $20 off "Digital WTF" by Coté (https://leanpub.com/digitalwtf/c/sdt) Sponsor the show (https://www.softwaredefinedtalk.com/ads): ads@softwaredefinedtalk.com (mailto:ads@softwaredefinedtalk.com) Recommendations Brandon: Why Data Doesn't Always Win, with a Philosopher of Art (https://podcasts.apple.com/us/podcast/the-points-you-shouldnt-score-a-new-years-resolution/id1685093486?i=1000743950053) (Apple Podcasts) Why Data Doesn't Always Win, with a Philosopher of Art (https://www.youtube.com/watch?v=7AdbePyGS2M&list=RD7AdbePyGS2M&start_radio=1) (YouTube) Coté: “Databases in 2025: A Year in Review.” (https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-retrospective.html) Photo Credits Header (https://unsplash.com/photos/red-and-black-love-neon-light-signage-igJrA98cf4A)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jan 8, 2026 78:24


Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b

Python Bytes
#464 Malicious Package? No Build For You!

Python Bytes

Play Episode Listen Later Jan 5, 2026 30:18 Transcription Available


Topics covered in this episode: ty: An extremely fast Python type checker and LSP Python Supply Chain Security Made Easy typing_extensions MI6 chief: We'll be as fluent in Python as we are in Russian Extras Joke Watch on YouTube About the show Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: ty: An extremely fast Python type checker and LSP Charlie Marsh announced the Beta release of ty on Dec 16 “designed as an alternative to tools like mypy, Pyright, and Pylance.” Extremely fast even from first run Successive runs are incremental, only rerunning necessary computations as a user edits a file or function. This allows live updates. Includes nice visual diagnostics much like color enhanced tracebacks Extensive configuration control Nice for if you want to gradually fix warnings from ty for a project Also released a nice VSCode (or Cursor) extension Check the docs. There are lots of features. Also a note about disabling the default language server (or disabling ty's language server) so you don't have 2 running Michael #2: Python Supply Chain Security Made Easy We know about supply chain security issues, but what can you do? Typosquatting (not great) Github/PyPI account take-overs (very bad) Enter pip-audit. Run it in two ways: Against your installed dependencies in current venv As a proper unit test (so when running pytest or CI/CD). Let others find out first, wait a week on all dependency updates: uv pip compile requirements.piptools --upgrade --output-file requirements.txt --exclude-newer "1 week" Follow up article: DevOps Python Supply Chain Security Create a dedicated Docker image for testing dependencies with pip-audit in isolation before installing them into your venv. Run pip-compile / uv lock --upgrade to generate the new lock file Test in a ephemeral pip-audit optimized Docker container Only then if things pass, uv pip install / uv sync Add a dedicated Docker image build step that fails the docker build step if a vulnerable package is found. Brian #3: typing_extensions Kind of a followup on the deprecation warning topic we were talking about in December. prioinv on Mastodon notified us that the project typing-extensions includes it as part of the backport set. The warnings.deprecated decorator is new to Python 3.13, but with typing-extensions, you can use it in previous versions. But typing_extesions is way cooler than just that. The module serves 2 purposes: Enable use of new type system features on older Python versions. Enable experimentation with type system features proposed in new PEPs before they are accepted and added to the typing module. So cool. There's a lot of features here. I'm hoping it allows someone to use the latest typing syntax across multiple Python versions. I'm “tentatively” excited. But I'm bracing for someone to tell me why it's not a silver bullet. Michael #4: MI6 chief: We'll be as fluent in Python as we are in Russian "Advances in artificial intelligence, biotechnology and quantum computing are not only revolutionizing economies but rewriting the reality of conflict, as they 'converge' to create science fiction-like tools,” said new MI6 chief Blaise Metreweli. She focused mainly on threats from Russia, the country is "testing us in the grey zone with tactics that are just below the threshold of war.” This demands what she called "mastery of technology" across the service, with officers required to become "as comfortable with lines of code as we are with human sources, as fluent in Python as we are in multiple other languages." Recruitment will target linguists, data scientists, engineers, and technologists alike. Extras Brian: Next chapter of Lean TDD being released today, Finding Waste in TDD Still going to attempt a Jan 31 deadline for first draft of book. That really doesn't seem like enough time, but I'm optimistic. SteamDeck is not helping me find time to write But I very much appreciate the gift from my fam Send me game suggestions on Mastodon or Bluesky. I'd love to hear what you all are playing. Michael: Astral has announced the Beta release of ty, which they say they are "ready to recommend to motivated users for production use." Blog post Release page Reuven Lerner has a video series on Pandas 3 Joke: Error Handling in the age of AI Play on the inversion of JavaScript the Good Parts

Syntax - Tasty Web Development Treats
962: The Home Server / Synology Show

Syntax - Tasty Web Development Treats

Play Episode Listen Later Dec 10, 2025 35:20


Wes and Scott talk about their evolving home-server setups—Synology rigs, Mac minis, Docker vs. VMs, media servers, backups, Cloudflare Tunnels, and the real-world pros and cons of running your own hardware. Show Notes 00:00 Welcome to Syntax! 01:35 Why use a home server? 07:29 Apps for home servers 16:23 Home server hardware 18:27 Brought to you by Sentry.io 20:45 VMs vs containers and choosing the right software 25:53 How to expose services to the internet safely 30:38 Securing access to your server Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads