Hosting service for software projects using Git
POPULARITY
Categories
We found the best way for a Linux user to manage Windows: keep it remote, keep it contained, and touch the desktop as little as possible.Sponsored By:Webroot: Webroot is cloud-based antivirus, engineered to stay out of your way. For a limited time, you can save sixty percent.Jupiter Party Annual Membership: Put your support on automatic with our annual plan, and get one month of membership for free!Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love.Support LINUX UnpluggedLinks:
In this episode of The Cybersecurity Defenders Podcast, we discuss some intel being shared in the LimaCharlie community.DepthFirst reported that it's autonomous security agent discovered 21 previously unknown vulnerabilities in FFmpeg, a widely deployed multimedia framework used across browsers, streaming infrastructure, and other systems that process media. Bundler, 4.0.13 introduces a new security feature called cooldown, aimed at reducing the impact of software supply chain attacks in the Ruby ecosystem. A new variant of the Shai-Hulud supply chain worm, known as Miasma, briefly disrupted Microsoft's software development ecosystem after compromising dozens of GitHub repositories.Meta says approximately 20,000 Instagram accounts may have been compromised through the abuse of an AI powered account recovery support system.Support our show by sharing your favorite episodes with a friend, subscribe, give us a rating or leave a comment on your podcast platform.This podcast is brought to you by LimaCharlie, maker of the SecOps Cloud Platform, infrastructure for SecOps where everything is built API first. Scale with confidence as your business grows. Start today for free at limacharlie.io.
Should you convert your website into Markdown to help Large Language Models (LLMs) understand your content better? Is "llms.txt" worth the effort for SEO? In this episode of Search Off the Record, Martin Splitt and John Mueller from the Google Search Relations team dive deep into the history of Markdown, its rise in the AI era, and whether it holds any real weight for search engine discovery. In this episode, you'll learn: The Origins of Markdown: From John Gruber and Aaron Swartz to its status as the "language of GitHub." Markdown vs. HTML: Why the "cleanliness" of Markdown is tempting for developers but potentially risky for site structure. LLMs & Markdown: Do AI crawlers actually prefer Markdown, or are they already experts at parsing HTML? The "Parallel Version" Trap: Why creating a separate text/Markdown version of your site for AI can lead to the same maintenance nightmares as dynamic rendering. Use Cases that Make Sense: When Markdown is actually superior (like developer documentation) and when it's totally unnecessary (like your shoe catalog). Key Takeaways for SEOs & Developers: Crawlers are built for the "messy" web: Google and other engines have decades of experience parsing HTML. Don't sacrifice discovery: Headers, footers, and sidebars in HTML provide critical context for site structure that a raw Markdown file might lack. Maintenance is king: Avoid the complexity of maintaining two versions of the same content. Chapters 0:00 - Introduction: Should we all be using Markdown? 3:45 - The history and purpose of Markdown. 7:15 - Why developers love it: Separation of style and content. 11:20 - Do crawlers need Markdown to understand your site? 14:50 - The danger of "parallel versions" and dynamic rendering lessons. 17:30 - Discussing the "llms.txt" proposal and AI agents. 21:00 - Where Markdown actually makes sense (Developer Docs). 24:00 - Final verdict: Stick to HTML for the web. Resources Mentioned: Google Search Central: https://developers.google.com/search Are you using Markdown for your site's frontend or just as a backend source? Let us know in the comments! Episode transcript → https://goo.gle/sotr111-transcript Listen to more Search Off the Record → https://goo.gle/sotr-yt Subscribe to Google Search Channel → https://goo.gle/SearchCentral Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team. #SOTRpodcast #SEO #GoogleSearch Speakers: Martin Splitt, John Mueller
CISA directs agencies to “patch smarter, not harder.” The House fails to extend FISA. Europol pulls over AudiA6. GitHub announces npm security updates. Anthropic rejects Fable 5 jailbreak claims. CISA gives feds three days to patch a critical Ivanti Sentry vulnerability. Google confirms ShinyHunters exploited a critical Oracle PeopleSoft vulnerability. FancyBear shifts part of its infrastructure to compromised edge devices. Pundits push for CyberCorps scholarship budgets. Our guest is Dr. Renée Burton, VP of Threat Intelligence at Infoblox, to discuss scams targeting the World Cup. Amazon drivers sweat through a software update. Remember to leave us a 5-star rating and review in your favorite podcast app. Miss an episode? Sign-up for our daily intelligence roundup, Daily Briefing, and you'll never miss a beat. And be sure to follow CyberWire Daily on LinkedIn. CyberWire Guest Today we are joined by Dr. Renée Burton, VP of Threat Intelligence at Infoblox, to discuss the World Cup and fans possibly getting caught out if they use SuperBox to view it. Selected Reading CISA directive orders agencies to prioritize vulnerability patching in a new way (CyberScoop) House votes against extending controversial wiretapping law set to lapse Friday (The Washington Post) Ransomware gangs cut off from EUR 336 million ‘AudiA6' crypto laundering pipeline - Europol analysis links the criminal service to over 15 international cybercrime investigations (Europol) GitHub to Update npm to Thwart Software Supply Chain Attacks (Infosecurity Magazine) Anthropic Disputes Fable 5 AI Jailbreak (SecurityWeek) CISA orders feds to patch actively exploited Ivanti flaw by Sunday (Bleeping Computer) Google Confirms Exploitation of Oracle PeopleSoft Zero-Day by ShinyHunters (SecurityWeek) GRU-Linked APT28 Uses MooBot Botnet and Compromised EdgeRouters for Cyber Operations (GB Hackers) CyberCorps is adapting to AI. The budget isn't keeping up. (CyberScoop) Software Update Automatically Turns off Amazon Delivery Drivers' AC During Dangerous Summer Heat (404 Media) Share your feedback. What do you think about CyberWire Daily? Please take a few minutes to share your thoughts with us by completing our brief listener survey. Thank you for helping us continue to improve our show. Want to hear your company in the show? N2K CyberWire helps you reach the industry's most influential leaders and operators, while building visibility, authority, and connectivity across the cybersecurity community. Learn more at sponsor.thecyberwire.com. The CyberWire is a production of N2K Networks, your source for strategic workforce intelligence. © N2K Networks, Inc. Learn more about your ad choices. Visit megaphone.fm/adchoices
Live from Microsoft Build, Corey Noles sits down with Scott Hanselman for a hands-on Neuron LIVE episode about AI-augmented software development, how it differs from just "vibe coding", and the surprisingly practical things people can now build with tools like GitHub Copilot and more.Scott is one of the best technical explainers in software: a longtime Microsoft and GitHub developer, teacher, speaker, author, blogger, and podcaster who has helped millions of developers understand new technology without making it feel impossible to learn.This episode turned into a live demo tour of what AI coding can already do, led by Scott's own use-cases. Corey and Scott walked through a series of examples showing how AI can help people build useful apps, prototypes, workflows, and small tools from everyday ideas, including Scott's own vibe-coded tools Baby Smash (https://www.babysmash.com/), which lets babies press random buttons for fun shapes and sounds, and Tiny Tool Town (https://www.tinytooltown.com/), which showcases random, cool tools Scott found around the web. But in the coolest demo of all, Scott shows how to take an open source tool and create software a personal blood sugar tracking app for his own diabetes management. If that doesn't get your idea blood flowing for what you can do with AI, we don't know what will! https://www.theneuron.ai/
Having recently moved house, Gary wonders how to reconfigure his homelab and network setup. Plus Shane is fed up with GitHub’s outages and formulates a plan to move away… somewhere… Support us on patreon and get an ad-free RSS feed with early episodes sometimes Subscribe to the RSS feed.
[This episode from February 2024 was never published and recently discovered]In today's episode, Andrew kicks things off with a rant about tackling developer experience tasks at Podia, wrestling with GitHub actions, and Heroku deployment woes. Then the conversation takes a turn to the importance of debugging, the power of bash scripting, and the challenges of naming in programming, with Chris mentioning DHH's insights from a live stream. They discuss Chris's travel plans for RubyConf in Australia, other conferences coming up, and reminisce about their childhood love for trains and Thomas the Tank Engine. The episode wraps up with Chris and Andrew sharing advice and tips on writing conference proposals (CFPs) and the value of diverse speaking styles and personalities for engaging an audience. Tune in now to hear more!LinksONCE/Campfiredebug.rbGitHub CopilotRubyConf Australia-April 11-12, 2024RailsConf 2024-May 7-9, 2024-Detroit, MISarah Mei-“What Your Conference Proposal is Missing”Ruby for All Podcast-Episode 50: The Art of Conference Speaking with Kevin Murphy[SFM] We like to party (YouTube)Ultimate Skyrim (YouTube)RailsConf 2023-Teaching Capybara Testing- An Illustrated Adventure by Brandon Weaver (YouTube)Chris Oliver X/TwitterAndrew Mason X/TwitterJason Charnes X/Twitter
Fortinet patches a new critical FortiSandbox flaw GitHub to disable npm install scripts by default to stop supply chain attacks Nottingham University announces data breach Get the show notes here: https://cisoseries.com/cybersecurity-news-fortinet-patches-fortisandbox-github-disables-npm-scripts-nottingham-university-breach/ Thanks to our episode sponsor, Doppel Social engineering attacks look trustworthy — a routine request, an internal email, a familiar face on a call. But Doppel sees through the disguise. Our AI-native platform detects and disrupts attacks across every channel, while training employees to recognize deepfakes and deception. We fight relentlessly to protect your business, brand, and people. Doppel. Outpacing what's next in social engineering. Learn more at doppel.com.
In this sponsored episode, James Wilson chats with SpecterOps CTO Jared Atkinson about the central role that GitHub has played in recent supply chain compromises. GitHub is where code gets built, tested, and shipped to devices, cloud, and on-prem environments. Understanding the paths an attacker can use to get into GitHub, and where they can pivot to from there, is essential to securing your GitHub repos and CI/CD pipelines. Show notes
Having recently moved house, Gary wonders how to reconfigure his homelab and network setup. Plus Shane is fed up with GitHub’s outages and formulates a plan to move away… somewhere… Support us on patreon and get an ad-free RSS feed with early episodes sometimes Subscribe to the RSS feed.
Hey folks, Alex here, and welcome to a BIG MODEL week! We finally got Mythos (well almost)! Let me catch you up! This week started with WWDC26 from Apple, and Max Weinbach, who was in the room at Apple Park and actually has access to some of the new features including an all new SIRI AI, joined us to break down what could be the most used AI in the world very soon. At first I was skeptical, but he convinced me that the new Siri is actually good! Then, we saw the ultimate model drop: Anthropic finally shipped Mythos (X, my system card thread, benchmarks). Same weights, two names: Mythos 5 is the unrestricted version that only Project Glasswing partners get, Fable 5 is what the rest of us get, wrapped in the heaviest guardrails I've ever seen ship on a frontier model. It's state of the art on nearly every benchmarkThe model that was “too dangerous to release” is now... well, released, but with the heaviest guardrails we've seen. More on this later. Peter Gostev from Arena.ai joined us to break down the new model. Last but definitely not least, Google released a real-time translation model, that our friend Thor Schaeff from DeepMind demoed live, while we all spoke in different languages and it translated us in REAL TIME. It was really cool, definitely check that out. There's quite a few more things, like Loop Engineering Alpha, Swyx came by to talk about FrontierCode, OpenAI confirmed our suspicions that the anti-datacenter social media posts could be a concerted effort by groupds links to the Chinese government and much more. Let's dive in! ThursdAI - Let me catch you up, every week!
Chad talks with guest Andy Budd, Design Leadership Coach & Venture Partner at Seedcamp, as they look back over Andy's time at Clearleft, the company he co-founded back in 2005. Andy discusses employee trust ownership, how it both benefits and protects your employees, and has the potential to keep your business going for generations to come rather than living and dying by the founders interest. Chad also announces that thoughtbot is moving into a Purpose Trust model. — Our guest for this episode has been Andy Budd. If you'd like to get in touch with Andy, or to keep up to date with his work, you can do so through BlueSky, LinkedIn, or through his website. If you are a Medium user, you can also follow Andy at The Design VC See Andy's panels at Evolve at the Brighton Centre, 26th June. Your host for this episode has been Chad Pytel. You can find Chad all over social media as @cpytel, or over on LinkedIn. If you would like to support the show, head over to our GitHub page, or check out our website. Got a question or comment about the show? Why not write to our hosts: hosts@giantrobots.fm This has been a thoughtbot podcast. Stay up to date by following us on social media - LinkedIn - Mastodon - YouTube - Bluesky © 2026 thoughtbot, inc.
As AI agents become more capable and autonomous, they also introduce new security challenges. In this 'Fully Connected' episode, Dan and Chris unpack Anthropic's Zero Trust for AI Agents security framework and what it means for organizations deploying agentic systems. They examine the key security risks facing agentic systems and discuss how organizations can apply Zero Trust principles to deploy AI agents safely. Along the way, they break down practical security controls and discuss how traditional cybersecurity principles must evolve for the age of AI agents.Featuring:Chris Benson – Website, LinkedIn, Bluesky, GitHub, XDaniel Whitenack – Website, GitHub, XLinks: Zero Trust for AI AgentsOWASP GenAI Project Sponsors:Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalaiUpcoming Events: Register for upcoming webinars here!Midwest AI Summit 2026
Si has estado escuchando los últimos capítulos, te habrás dado cuenta de que he estado sumergido de lleno en el fascinante (y a veces abrumador) mundo de la Inteligencia Artificial. De vez en cuando mi mente me pide a gritos un descanso. Y para mí, descansar significa volver a los orígenes: ponerme a cacharrear con la terminal y escribir código en Rust.En el episodio de hoy quiero cambiar completamente de tercio. Te voy a contar mi experiencia de las últimas semanas saliendo de mi zona de confort con un editor de texto modal que me tiene maravillado en los servidores, y te presentaré cuatro herramientas que he desarrollado en Rust para solucionar pequeños problemas del día a día directamente en la consola de comandos. Así que, ponte cómodo mientras cocinas, vas de camino al trabajo o das un paseo, ¡porque nos vamos directos al turrón!El gran dilema de la terminal: ¿Por qué uso Helix en mis servidores si soy fiel a NeoVim?Los que me seguís desde hace tiempo sabéis que mi editor de cabecera en mi equipo de trabajo habitual es NeoVim. Llevo muchísimos años puliendo mi configuración y, a día de hoy, tengo más de cien plugins instalados que hacen que mi entorno sea espectacular: autocompletado instantáneo, una barra de estado genial, un explorador lateral de archivos y un sistema de análisis de código brutal. Pero, ¿qué pasa cuando me conecto por SSH a mis servidores de producción? Normalmente, estos servidores corren distribuciones Ubuntu de soporte a largo plazo con paquetes más antiguos, por lo que mi configuración de NeoVim moderna empieza a fallar estrepitosamente.Instalar y mantener más de cien plugins en cada uno de los servidores que gestiono es un dolor de cabeza inmanejable. Para solucionar esto sin renunciar a la agilidad de un editor modal en terminal, decidí darle una oportunidad a Helix.Peleándome con la memoria muscularTengo que confesarte que adaptarme a Helix ha sido un ejercicio duro para mis dedos. Cuando llevas años interiorizando los comandos de Vim, tu cerebro automatiza la edición. Mis herramientas caseras desarrolladas en RustAquí te hablo de ellas en detalle:1. mkdr (Markdown Reader/Render): Como todos mis artículos de atareao.es y mis notas personales están guardados en formato Markdown, necesitaba un renderizador potente para leerlos cómodamente desde la consola de comandos. 2. id3cli: Automatizar los metadatos de los episodios de este podcast es crucial para mí. 3. rustled: Para que mi asistente de inteligencia artificial, Cloe, pudiera comunicarse conmigo por voz, necesitaba una herramienta de texto a voz (Text-to-Speech) flexible4. ssrs: Si en algún momento no dispongo de conexión a internet o prefiero que los textos se procesen con absoluta privacidad, recurro a susurros.00:00:00 Introducción y un descanso de la Inteligencia Artificial00:00:56 ¿Qué es Helix y por qué me costó al principio?00:02:27 El problema de llevar NeoVim (y sus plugins) a los servidores00:06:23 Primeros pasos con Helix: el tutor y las diferencias con Vim00:09:34 Pantalla dividida, multicursor y velocidad extrema00:10:54 Temas, resaltado de sintaxis de serie y comandos00:15:12 Mis propias herramientas: renderizar Markdown en terminal con mkdr00:18:40 Navegación estilo Wiki y otras ventajas de mkdr00:20:18 id3click: gestionando etiquetas MP3 sin depender de terceros00:21:52 Dándole voz a Cloe: raslet y la API de Microsoft Edge TTS00:24:35 susurros: generación de voz 100% en local con Rust00:26:55 El futuro: ssrs (Whisper en Rust) y conclusiones00:28:35 Recomendación de podcast: Legalmente Productivos y despedidaMás información y enlaces en las notas del episodio
Een AI-supportbot die zonder goede controle Instagram-accounts weggeeft, de Silent Ransom Group die data steelt en slachtoffers afperst zonder bestanden te versleutelen, en een GitHub-issue waarmee een AI-agent zijn eigen repository in gevaar kan brengen. Ronald, Marco en Jelle beginnen met drie verhalen waarin vertrouwen gevaarlijk ruim wordt uitgedeeld. Daarna duikt Ronald in YellowKey. Met een speciaal geprepareerde USB-stick kan een aanvaller Windows Recovery misleiden en de standaard BitLocker-bescherming van Windows 11 omzeilen. Minstens zo interessant is de ruzie eromheen: onderzoeker Nightmare-Eclipse zegt meerdere zero-days te publiceren uit frustratie over Microsoft, waarna een publiek conflict ontstaat over disclosure, verantwoordelijkheid en de macht van een grote leverancier. Marco bespreekt vervolgens een proof-of-concept voor adaptieve AI-wormen. In plaats van één vast ingebouwd aanvalspad gebruikt deze worm lokale AI-agenten om per doelwit een strategie te bedenken, fouten te herstellen en kennis met andere besmette machines te delen. Het is nog laboratoriumonderzoek, maar wel een ongemakkelijke vooruitblik op malware die ook redeneert. Tot slot gaat Jelle ouderwets Shodan-bingo spelen met automatic tank gauges: kleine systemen die brandstof- en vloeistoftanks meten en soms nog direct aan het internet hangen. Cyber-fysieke ellende hoeft niet te beginnen bij een energiecentrale; een vergeten meetkastje met een hardcoded creds is soms genoeg. *Bronnen* Meta AI-support en Instagram - 404 Media: https://www.404media.co/hackers-simply-asked-meta-ai-to-give-them-access-to-high-profile-instagram-accounts-it-worked/ - TechCrunch: https://techcrunch.com/2026/06/01/hackers-hijacked-instagram-accounts-by-tricking-meta-ai-support-chatbot-into-granting-access/ Silent Ransom Group en DNS fast flux - Resecurity: https://www.resecurity.com/blog/article/silent-ransom-group-srg-uncovering-dns-fast-flux-infrastructure - FBI: https://www.fbi.gov/file-repository/cyber-alerts/silent-ransom-group-targeting-law-firms-052325.pdf Claude Code GitHub Action - GMO Flatt Security: https://flatt.tech/research/posts/poisoning-claude-code-one-github-issue-to-break-the-supply-chain/ YellowKey en Microsoft - Ars Technica: https://arstechnica.com/security/2026/05/zero-day-exploit-completely-defeats-default-windows-11-bitlocker-protections/ - Windows Central: https://www.windowscentral.com/microsoft/microsoft-backs-off-legal-threats-against-windows-security-researchers Adaptieve AI-wormen - Paper, AI Agents Enable Adaptive Computer Worms: https://arxiv.org/abs/2606.03811 Automatic tank gauges - NSA: https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/Article/4507204/nsa-joins-cisa-and-partners-to-release-guidance-on-hardening-automatic-tank-gau/ - BleepingComputer: https://www.bleepingcomputer.com/news/security/cisa-warns-of-cyberattacks-targeting-fuel-tank-monitoring-systems/
VAR 2.0: Copa do Mundo 2026 terá Avatar 3D dos jogadores para usar como ‘tira-teima'. Receita Federal nega vazamento de dados de 248 milhões de brasileiros. CazéTV e iFood lançam bolão da Copa com R$ 3,5 mi em prêmios; confira como participar. Miasma: código-fonte de vírus vaza no GitHub e vira kit de cibercrime. Instagram 'entregou por engano' localização de usuários brasileiros; saiba como funciona e como desativar. A IA entrou em dieta forçada. Anatel quer usar sistema de alerta da Defesa Civil para encontrar desaparecidos.
On this week's show special guest co-host Chris Wade, the founder of Corellium turned Cellebrite CTO, joins Patrick Gray and James Wilson to discuss the week's cybersecurity news. They cover: Microsoft has repos owned, GitHub tokens popped, and a new 0day dropped on them Meanwhile, researchers are choosing full disclosure instead of engaging MSRC Meta's AI support agent allowed a staggering 20,000 accounts to be stolen! Apple pulls Russia's MAX messenger from the App Store and disables notifications Anthropic gives the public our first Mythos-class model but it won't do cybersecurity work Stripe and Google Tag Manager used in eCommerce website hack campaign And much, much more! This week's show is brought to you by runZero. HD Moore, runZeros' founder, drops by in this week's sponsor interview to talk about the AI vibe shift. Everyone is very worried about getting owned all of a sudden, and it's really changing the cybersecurity business. This episode is also available on YouTube. Show notes Microsoft Hacked to Deliver Malware to Claude and Gemini Users | 404.feed.press Researcher publishes GitHub token-stealing exploit, blames Microsoft's disclosure process | therecord.media Microsoft Defender 'RoguePlanet' zero-day grants SYSTEM privileges | BleepingComputer Microsoft breaks Patch Tuesday record with 206 vulnerabilities | CyberScoop chompie1337 | X WhatsApp says NSO targeted users with spearfishing attacks in violation of court order | therecord.media Over 20,000 Instagram accounts stolen in Meta AI support hack | BleepingComputer New Apple feature automatically changes your compromised passwords | BleepingComputer Apple removes Russia's state-backed messaging app Max from its store | therecord.media Exclusive: Anthropic's Mythos can exploit new flaws in hours | Anthropic's new model is Mythos on a leash | CyberScoop Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe' Version for the Rest of You | wired.com OpenClaw AI agent found falling for phishing attacks, spills user data | BleepingComputer OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks | TechCrunch Security Hands on with Intelligent Terminal, an AI-powered Windows Terminal | BleepingComputer Seeking Counsel: Ongoing Targeted Campaign Against US Law Firms | Mandiant Check Point warns of zero-day flaw targeted by ransomware affiliate | Cybersecurity Dive ServiceNow discloses security incident exposing customer data | BleepingComputer Credit card theft campaign abuses Stripe to host stolen payment info | BleepingComputer CrowdStrike, Palo Alto Networks defy estimates as AI fuels cyber demand | Cybersecurity Dive The U.S. Military Quietly Turned GPS Into a Global ‘Numbers Station,' Evidence Suggests | 404.feed.press New 'HTTP/2 Bomb' DoS attack crashes web servers in under a minute | BleepingComputer Google has quietly cut staff across its Cloud business | businessinsider.com
AI Ready: Ahmad Ghabboun Ahmad Ghabboun built a Demo Day–winning AI product during his MSIS program — after arriving with no plans to work in AI at all. He breaks down how his mindset shifted, how his design background made him a stronger prompter, and how to build AI fluency that actually holds up in interviews. Useful for students and early-career professionals trying to get AI-ready without faking it. Ahmad Ghabboun is a Master of Science in Information Systems (MSIS) 2026 Graduate at the UW Foster School of Business. Before Foster, he spent roughly fifteen years in UX and product design, building web applications for startups. At Foster he built several generative-AI tools in his coursework, including Synapse, which won Best Business and Tech Product at the MSIS Demo Day. He is targeting product management and technical product roles. What you'll learn Why naming the specific AI model you use — and justifying it — matters more in interviews than saying "I use AI" How a design background translates into sharper, more technical prompts How to keep a human in the loop so AI assists your judgment instead of replacing it Why AI's tendency to agree with you makes human and second-model pushback essential How to stay current with fast-moving tools without trying to learn everything The difference between a productivity mindset and a learning mindset in school Key moments The third-quarter AI classes that moved AI from "not on my list" to his career focus The origin of Synapse: manually juggling answers across Gemini, Claude, and a third model How Synapse runs a dual-model validation and a judge step to flag gaps for technical PMs Why interview proctoring now detects AI use — and what a "perfect" AI answer signals to interviewers Ethan Mollick's "jagged edge" and why it shifts with every model release Resources mentioned Lovable; Replit; Gemini; Claude; ChatGPT; Jira; Azure DevOps; GitHub; Ethan Mollick's "jagged frontier" of AI capability.
Instagram AI Support Hack Hits 20,225 Accounts; AI Worm 'Hades' Lies to Security Tools; Chrome Zero-Day Patch Host David Shipley reports Meta says 20,225 Instagram accounts were hijacked after an AI support tool was tricked into sending reset links to attacker-controlled emails, with only MFA-protected accounts resisting. Step Security details a new Miasma-derived worm wave called Hades that targets config files for 14 AI coding tools, can inject instructions to hijack assistants, lies to AI security tools, and includes a "dead man switch" wipe if stolen GitHub tokens are revoked; Microsoft also removed some GitHub repos after 73 open-source projects were compromised to inject an info stealer. University of Toronto and Vector Institute researchers demonstrated an AI worm using a free local model that spread across a simulated network via known flaws and misconfigurations. Google issued an emergency Chrome patch for actively exploited CVE-2026-11645 in V8, and insurers are tightening claims scrutiny and increasingly excluding AI-related liabilities. 00:00 Instagram AI Hack Fallout 01:36 AI Worm Hades Evolves 02:55 Microsoft Repo Compromise 03:54 Lab Built AI Worm Demo 05:27 Emergency Chrome Zero Day 07:07 Cyber Insurance Tightens Up 08:02 AI Liability Coverage Shrinks 09:16 Wrap Up and Sign Off
In this episode of the Ardan Labs Podcast, Ale Kennedy talks with François Bitouzet, Managing Director of Viva Technology, about the forces shaping the future of technology and innovation. François shares his journey from studying in France to leading one of the world's largest technology and startup events, connecting entrepreneurs, investors, and industry leaders from around the globe.00:00 Introduction02:58 Education and Early Influences08:53 Early Career and Communication17:47 Communication in a Changing World32:25 Innovation and Technology42:40 Creativity and Marketing49:38 Leadership and Career Growth54:54 Adapting to Technological Change59:42 The Future of Events01:06:54 AI and Society01:11:20 Startups and Innovation01:15:35 Deep Tech and the FutureConnect with François: LinkedIn: https://www.linkedin.com/in/fran%C3%A7ois-bitouzet-180a89/Mentioned in this Episode:Viva Technology: https://vivatechnology.comWant more from Ardan Labs? You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!Online Courses : https://ardanlabs.com/education/ Live Events : https://www.ardanlabs.com/live-training-events/ Blog : https://www.ardanlabs.com/blog Github : https://github.com/ardanlabs
Anthropic released Claude Fable 5, a guardrailed Mythos-class model, to the public and Mythos 5 to trusted partners. OpenAI confidentially filed for an IPO. Hackers injected credential-stealing malware into 70+ Microsoft GitHub repos, and Apple details its new Gemini-based foundation models. Anthropic releases Claude Fable 5, a "safe" Mythos-class model it says can't be used for cyberattacks, to the public, and Claude Mythos 5 to trusted orgs (Wired) OpenAI confidentially files for an IPO, says it has "not decided on timing yet", as "there are things we want to do that are likely easier as a private company" (CNBC) Microsoft disabled 70+ of its repos on GitHub, including Azure-related tools like azure-functions-host, after hackers added credential-stealing malware to them (TechCrunch) MG Siegler: after being left for dead in AI, Apple is set to win at the consumer level — the power of the default, superior product instincts, and no real competition (Spyglass) Ben Thompson: the iPhone is the true core of Siri AI, and Apple is the only company positioned to work across apps with personal context — as long as it's not vaporware (Stratechery) Learn more about your ad choices. Visit megaphone.fm/adchoices
What happens when you strip away decades of engineering abstractions and let AI navigate the wild west between your initial intent and the final outcome? This week on Dev Interrupted, Anush Elangovan, VP of AI Software at AMD, returns to unpack the rapid shift toward an agentic software development lifecycle. Anush introduces the concept of "Agentic IO," a workflow where engineers focus strictly on high-level goals while AI handles the complex implementation. The conversation also highlights the expanding productivity wingspan of modern developers, the power of local open source models, and why speed remains the ultimate competitive moat. Learn why: LinearB is a Leader in the 2026 Gartner® Magic Quadrant™ for Developer Productivity Insight PlatformsFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's guest:AMD ROCm: Learn more about AMD's open-source software stack for AI at rocm.docs.amd.com and on GitHub.AMD Advancing AI 2026: Register for AMD's flagship global AI event taking place July 22-23 in San Francisco at amd.com/advancing-ai.Follow Anush on LinkedIn: Anush Elangovan | AMD blogOFFERSStart Free Trial: Get started with LinearB's AI productivity platform for free.Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era.LEARN ABOUT LINEARBAI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production.AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance.AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil.MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.
Rizel Scarlett is a Principal Developer Advocate at Entire and a software engineer and community builder. She previously worked in developer advocacy roles at Block and GitHub and shares content about open source and AI agents.You can find Rizel on the following sites:XGitHubLinkedInMastodonBlogTwitchPLEASE SUBSCRIBE TO THE PODCASTSpotifyApple PodcastsYouTube MusicAmazon MusicRSS FeedYou can check out more episodes of Coffee and Open Source on https://www.coffeeandopensource.comCoffee and Open Source is hosted by Isaac Levin
The Legend of Zelda: Ocarina of Time vai ganhar remake, Kingdom Hearts 4 ganha novo trailer. Vem conferiro o resumão da Nintendo Direct de junho que aconteceu hoje de manhã. ANPD processa Claro e Serasa por compartilhamento de dados. Instagram finalmente permite reorganizar a grade de publicações do perfil; saiba como. Siri AI, Liquid Glass e mais: 10 grandes destaques do iOS 27 da Apple. Microsoft desativa 73 repositórios no GitHub após ataque cibercriminoso. iPhones antigos perderão suporte ao WhatsApp; confira os modelos e mais! E eu sou Amanda Fleure, a companhia de vocês nessa noite no Hoje no TecMundo, seu programa diário de tecnologia que começa depois da vinheta envolvente que o editor vai colocar aí pra gente!
Apple intenta resucitar Siri con IA contextual mientras OpenAI prepara su tercera fase y una posible salida a bolsa. Microsoft desactiva más de 70 repositorios en GitHub por malware roba credenciales, Nvidia quiere llevar agentes al PC con RTX Spark y una nueva plataforma criogénica de carburo de silicio apunta a controlar mejor la computación cuántica.Puedes seguirnos en YouTube en https://youtube.com/olivernabani y puedes unirte al Discord Mashain en https://olivernabani.com/discord
Scott and Wes sit down with Ben Vinegar, former Syntax GM and founder of Modem.dev, to geek out over terminal-maxxing, from SSH-based development and tmux workflows to AI-powered coding agents. Ben also demos two of his open source tools: Hunk, a slick terminal code reviewer with 4k+ GitHub stars, and TermDraw, a terminal-based diagramming tool that posts directly to your agent. Show Notes 00:00 Welcome to Syntax! 00:49 Introduction to Modem and AI Project Management 01:40 Exploring Terminal Usage and Productivity 04:26 Setting Up Remote Development Environments 08:38 The Power of TMUX in Development 11:20 What makes TMUX splitting different? 12:46 Integrating AI with Terminal Workflows 14:56 The Future of Terminal Applications 17:31 Balancing GUIs and Terminal Interfaces getfresh.dev Ben's talk at AI Engineer Miami 24:39 Navigating Development Tools and Environments 26:44 The Balance of Security and Convenience in Coding 30:27 Cautionary Tales: The Risks of YOLO Mode 33:53 Innovative Tools for Enhanced Coding Experience 34:09 Hunk: Terminal code review. 41:39 TermDraw: A New Way to Visualize Code and Ideas 46:22 The Dynamics of Open Source Contributions 48:31 Visualizing Code: Tools and Techniques 50:54 Podcasting and Editing Processes State of Agentic Coding. Podguy: Agent-driven post-production workflow for video podcasts 56:23 Introducing Modem: A Product Intelligence Platform 01:01:39 Connecting Feedback to Product Development 01:03:15 Sick Picks Sick Picks Ben: Nirvanna: The Band - The Show - The Movie, Timecrimes Shameless Plugs Ben: https://modem.dev/ Hit us up on Socials! Syntax: X Instagram Tiktok LinkedIn Threads Wes: X Instagram Tiktok LinkedIn Threads Scott: X Instagram Tiktok LinkedIn Threads Randy: X Instagram YouTube Threads
The weightless era of software is over. This week the AI buildout slammed into the physical world: concrete, copper, electricity, water, and capital. We map the paradox of record wealth at the top of the stack and intense friction everywhere else.Alphabet announced an $80 billion equity raise, its first major stock sale since the 2004 IPO, to fund an estimated $180 to $190 billion in AI compute capex for 2026, with Berkshire Hathaway taking a $10 billion private placement. Broadcom posted a record fiscal Q2 of $22.19 billion, AI chip revenue up 143%, and Marvell shipped the first 102.4 Tbps switch that Jensen Huang called the next trillion-dollar company.SoftBank overtook Toyota to become Japan's most valuable company after pledging 75 billion euros for 5 gigawatts of AI data centers in France. The bill for the combined ~$700 billion buildout is landing on workers: 2026 tech layoffs have reached roughly 142,000, and employment for developers under 26 has dropped nearly 20% since 2024.GitHub Copilot switched to token-based billing, with power-user bills jumping from about $29 to $750 and outliers hitting $3,000. NVIDIA and Microsoft launched the RTX Spark to run 120-billion-parameter models locally, Anthropic filed confidentially for a roughly $1 trillion IPO, and Ohio suspended its data-center tax break as a citizen petition aims to ban hyperscale data centers. Community consent, water, and energy are the real bottlenecks.If you want a prize, send us a DM:instagram.com/rickerandbontiktok.com/@rickerandbonyoutube.com/@rickerandbon
Today in the business of podcasting:Bumper has opened its podcast analytics dashboard to creators of every size, introducing a free tier alongside new Pro and Enterprise plans and making its independent Bumper Score available across the board, while Enterprise subscribers also gain access to a new Bumper MCP server connecting podcast data directly to AI tools.Two notable proposals are stirring debate in the open podcasting community on GitHub, one calling for a standardized way to disclose AI generated content in RSS feeds and another suggesting a method for verifying which apps are actually downloading episodes, an idea that could reshape how podcast apps get paid.The UK's Competition and Markets Authority has ordered Google to let publishers opt out of having their content used to power AI search features, giving the company nine months to roll out the changes and requiring it to publish regular compliance reports.New data from streaming measurement firm Digital i shows YouTube overtaking Netflix in daily audience attention, with Australia ranking among the top viewing markets worldwide and Netflix's own YouTube channel pulling in significant reach.A new analysis argues that short form video isn't killing long form content but reshaping its role instead, with short clips driving discovery and habit formation while podcasts and longer series build deeper audience attachment over time.To find links to these, and every article covered in today's episode, click here. You can also subscribe to The Download's newsletter to receive the full issue straight to your email inbox every day.
Today in the business of podcasting:Bumper has opened its podcast analytics dashboard to creators of every size, introducing a free tier alongside new Pro and Enterprise plans and making its independent Bumper Score available across the board, while Enterprise subscribers also gain access to a new Bumper MCP server connecting podcast data directly to AI tools.Two notable proposals are stirring debate in the open podcasting community on GitHub, one calling for a standardized way to disclose AI generated content in RSS feeds and another suggesting a method for verifying which apps are actually downloading episodes, an idea that could reshape how podcast apps get paid.The UK's Competition and Markets Authority has ordered Google to let publishers opt out of having their content used to power AI search features, giving the company nine months to roll out the changes and requiring it to publish regular compliance reports.New data from streaming measurement firm Digital i shows YouTube overtaking Netflix in daily audience attention, with Australia ranking among the top viewing markets worldwide and Netflix's own YouTube channel pulling in significant reach.A new analysis argues that short form video isn't killing long form content but reshaping its role instead, with short clips driving discovery and habit formation while podcasts and longer series build deeper audience attachment over time.To find links to these, and every article covered in today's episode, click here. You can also subscribe to The Download's newsletter to receive the full issue straight to your email inbox every day.
Photo by Mikey Frost on Unsplash Published 8 June 2026 e556 with Michael, Andy and Michael – Michael R's annual edu-cation, chatbot trickery, data center data visualization, LeRobot Humanoid open source robotics, LEGO and a whole lot more! Michael, Andy and Michael get things started with Michael R's annual edu-cation, Apple's World Wide Developer Conference. Then the team turns to a news story about tricking Meta's support chatbot into granting access to Instagram accounts. This is not a new tactic – check out e429 for a link to try this out for yourself with lakera.ai‘s Gandalf game. Next up is a data visualization for data centers across the United States. And then, a solution for the energy needs of a data center: CrankGPT. And harkening back to the earlier chatbot trickery, there's a GitHub repo to get Chipotle's chatbot Pepper to write python code and more. Then, the team considers an article from The Atlantic that spells out a contrarian view that this is in fact the best time for a computer science degree. A new robotics story captured the team's attention – the LeRobot Humanoid. Hugging Face developed this robotic set of legs as an accessible, low cost, open humanoid (well, humanoid legs) robot. Another intriguing maker project is a 3D book that has printed on it's pages the machine code needed to fabricate itself. Wrapping up the episode, the team takes a look at some of the newest LEGO sets featuring Gaudi's architecture and SmartPlay sets featuring Nintendo's Pokemon. What would you like to have your LeRobot do? Have your bots
Olvídate de hacerle preguntas genéricas a ChatGPT; hoy vamos a ver cómo sacarle partido real y práctico a la tecnología para solucionar problemas cotidianos y quitarnos de encima la fatiga de decisión diaria.Seguro que te suena la película: post-its en la nevera, hojas de cálculo que se quedan desactualizadas y el clásico "¿qué cenamos hoy?" que acaba en improvisación o en una compra desorganizada. Para evitar esto, he diseñado un ecosistema de agentes basados en cuatro cajas de herramientas que llamamos MCP (Model Context Protocol). Estos protocolos permiten que la IA no solo responda preguntas, sino que interactúe de forma directa con mis datos y aplicaciones externas.Te explico de forma muy sencilla las piezas que componen este sistema:El RAG Semántico para las recetas: Tengo una base de datos vectorial con unas 1.700 recetas cargadas en PostgreSQL mediante pgvector. La clave es que no busco platos por coincidencia exacta de palabras. Si le digo que quiero "algo rápido y ligero con verdura", el sistema realiza una búsqueda semántica, entiende lo que busco y me propone las mejores opciones. Todo esto se procesa de forma económica mediante OpenRouter sin necesidad de tener una potente GPU en local.Los Skills y SQLite: Los "Skills" definen los procesos exactos que debe seguir el modelo. Le he marcado unas pautas sencillas: platos únicos mediterráneos para comer y cenas ligeras. Toda esta información se gestiona en una base de datos SQLite muy ligera.Lógica difusa en la lista de la compra: El asistente es capaz de agrupar ingredientes similares. Si dos recetas piden tomates en formatos distintos (por ejemplo, "tomates a granel" y "100g de tomates"), la lógica difusa los unifica bajo un mismo concepto para evitar duplicados en la lista de la compra, organizando además los productos por pasillos o secciones (como frutería o carnicería).Typst para exportar a PDF: Para ver el menú en una tablet o imprimirlo para la nevera, utilizo Typst, una alternativa moderna a LaTeX que me genera unos documentos PDF impecables en cuestión de segundos.Además, te cuento cómo puedes montar todo esto en local de manera gratuita con Ollama, y aprovecho para actualizarte sobre mis andanzas de vuelta al "cacharreo" puro en Linux: desde mis experiencias recientes con el editor Helix y "mkdr" (mi renderizador de Markdown para terminal), hasta "podcli", una pequeña utilidad para exprimir los feeds de podcast desde la consola.Espero que disfrutes de este episodio tanto como yo montando todo este tinglado. ¡A cacharrear!Capítulos del episodio:00:00:00 Agentes de IA que de verdad nos facilitan la vida00:01:42 El ejemplo práctico: Automatizar nuestro menú semanal00:03:51 La fatiga de decisión y por qué la disciplina humana falla00:05:38 Mi caja de herramientas: 4 MCPs (Model Context Protocol)00:06:58 Buscando comida con IA: El RAG semántico de 1700 recetas00:08:45 Búsqueda híbrida y embeddings económicos sin usar GPU local00:10:00 Simplificando las comidas: El papel de los "Skills"00:11:58 Organizando la base de datos de manera sencilla con SQLite00:13:31 Lógica difusa: Evitando duplicados en la lista de la compra00:15:23 Creando PDFs bonitos con Typst (la alternativa moderna a LaTeX)00:17:03 Demostración en directo: Generando el menú de la semana00:19:12 Automatización total: Generación automática de menús con Cron00:20:19 Revisión del menú, las recetas y la alternativa local con Ollama00:23:12 De vuelta al "cacharrero" de Linux: Helix, mkdr y Podcli00:24:51 Próximos episodios: Instalación desde cero a producción de Hermes00:25:38 Despedida y cierre del episodioMás información y enlaces en las notas del episodio
Barb WB2CBA is a civil engineer by trade and an open source QRP designer by passion. Every circuit he's created has gone straight to GitHub with no paywalls and no kits he controls exclusively, and builders around the world have taken notice. In this episode Barb walks us through his major designs from the uSDX collaboration to the Pebble HF, talks about his experience at Four Days in May, and shares what it feels like to see strangers around the world building circuits he designed and released for free.If you're into QRP, digital modes, or homebrew radio, this one is worth your time.GitHub: github.com/WB2CBAPebble HF: pebblehf.comJoin us as we explore how you can get involved in portable radio, QRP, and more in this episode of the All Portable Discussion Zone (AP/DZ). Every aspect of portable operations is covered in this biweekly podcast, from news and gear to achievements, the workbench, contests, awards, and beyond.**DISCORD INVITE**: https://discord.gg/WVE3vVveWU#apdz #HamRadio #QRP #Workbench #Electronics #homebrewradio #DIYradio #testequipment #RFprojects #amateurradio #hamradiopodcast #scratchbuild #HFtransceiver #opensource #FT8 #digitalmodes #ADX #PebbleHF #Si5351 #QRPp #WSPR #FourDaysinMay #FDIM
Microsoft's Build conference was a firehose: in-house AI models, agent-first devices, new coding tools, and a Copilot "super app" that got teased but never shown. Todd Bishop and Mary Jo Foley sort through what's real and what's not quite fully baked, from Project Solara and the Scout agentic assistant to Microsoft's push for AI self-sufficiency and the mounting pressure on GitHub. Related Stories: Inside Microsoft’s Project Solara: A new platform for devices that run AI agents instead of apps Microsoft unveils seven homegrown AI models in new bid for ‘long term self-sufficiency’ Mary Jo Foley: No Copilot ‘Super App’ at Microsoft Build, but plenty of agentic fodder Microsoft’s OpenClaw team takes on the personal assistant challenge Edited by Curt Milton.See omnystudio.com/listener for privacy information.
You're tired of hearing “just build a SaaS” like it's easy, especially when you don't code, don't have a team, and still want something real that can actually make money. It can feel like everyone else has access to some secret playbook while you're stuck trying to figure out where to even begin. In this episode, Omar completely removes the gatekeeping and shows you what it actually looks like to build a real software business in a ridiculously short timeframe using AI. Nothing is hidden. He walks you through the exact tools, decisions, and steps he takes so you're not left guessing or piecing things together on your own. It's clear, practical, and designed to make you feel like this isn't some exclusive club, it's something you can dive into right now. If you've been waiting for proof that you can pull off your own AI-powered software build in a matter of hours, this is it. Click play at the top of the page and see how you can turn your idea into a real product faster than you thought possible. MBA2790 How To Build A Software Business With AI This Weekend. Zero Coding Skills Required. Must-Have Stack to Build Your Own AI App 1. Supabase 2. GitHub 3. Windsurf 4. Vercel 5. Claude 6. GoDaddy 7. Stripe 8. Kit Helper / Optional Tools to support your workflow 1. Wispr Flow 2. Google Forms 3. Chrome DevTools (Inspect Element) Recommended episode to explore: Can You Build A Profitable SaaS In 7 Days With Just AI? My Experiment With Proof! Watch the episodes on YouTube: https://lm.fm/GgRPPHi SUBSCRIBE YouTube | Apple Podcast | Spotify | Podcast Feed Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
✅ New autonomous agents. ✅ Canva designs made for you. ✅ Codex upgrades to make your business move. If you had your head down in spreadsheets this week, you missed some MAJOR AI upgrades that are available now. We track what's hot and what's not and break it all down on Fridays with our Friday Features. Autonomous Copilot agents, new Codex tools, Github CoPilot app and 7 more AI updates you should be using — An Everyday AI Chat with Jordan WilsonNewsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageToday's Episode on LinkedIn: Thoughts on this? Join the convo on LinkedIn and connect with other AI leaders.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:OpenAI Codex Role-Specific Plugins LaunchMicrosoft Build Conference AI Feature ReleasesChatGPT Memory and Business Account UpgradesMicrosoft Flash Image Model for PowerPointCanva Integrated with ChatGPT and CodexGitHub Copilot Standalone Desktop App PreviewMicrosoft Autopilot Always-On Work AgentsOpenAI Models Now Available on AWS BedrockCodex Sites: AI-Built Internal Web AppsTimestamps:00:00 OpenAI's big money moves03:47 Explaining role-specific plugins09:02 Microsoft's new image model release11:09 Microsoft's AI strategy and Canva update14:23 Canva integration with ChatGPT16:56 GitHub Copilot's new canvas feature20:46 AI token subscription changes24:42 AWS adds OpenAI models to Bedrock28:25 Introducing OpenAI's CodeX Sites Feature32:07 Launch of OpenAI's New Plug-in34:16 Overview of podcast structureKeywords: Autonomous copilot agents, Codex tools, GitHub Copilot app, OpenAI Codex, ChatGPT business accounts, OpenAI enterprise, Microsoft Build conference, Microsoft always-on agents, AWS AI updates, Canva plugin, ChatGPT memory upgrade, Windows Codex integration, Microsoft Flash model, Enterprise apps integration, Role-specific plugins, Sales data analytics, Product design AI, Creative production AI, Investment banking plugin, Public equity investing, Data analytics plugin, Workspace admins, App permissions, Role-aware work agent, Financial research automation, Microsoft image generation model, PowerPoint AI integration, OneDrive AI features, Visual design creation, Canva app for ChatGPT, Canva MCP server, Agentic context carry, Full screen design preview, GitHub Copilot desktop app, GitHub Copilot Canvas, Agent-native command center, Parallel agent work tree, Code app interface, Model options in GitHub, Token usage limits, Subscription token subsidizing, Anthropic token efficiency, Amazon Bedrock, GPT-4, GPT-4.5, Small language models, Token reckoning, Security governance, Inference engine, Code app sidebar, Codex Sites, Internal dashboards, Project trackers, Interactive web apps, Shareable AI apps, Enterprise data connectors, ChatGPT Canvas, Automated workflow, Workplace authentication, Creative briefs repository.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist.
This week I'm talking with Max Stoiber, currently working on ChatGPT's plugin directory and app platform at OpenAI. We discuss the hundreds of open source projects nobody remembers alongside the big ones like react-boilerplate and styled-components, how Spectrum became part of GitHub and eventually helped shape GitHub Discussions, the founder growth that came from building Stellate, the GraphQL cache that turned into a dual acquisition by Shopify and The Guild, and why ChatGPT apps feel like a new surface for software.
Hey friends! Backups are not as cool as pentesting, but boy do they matter when things go sideways. This week I'm sharing how a Proxmox backup disk space meltdown led me to a completely overhauled — and honestly pretty bulletproof — backup setup for both home and work. Claude played a big role in helping me sort it all out. Here's what we get into: The backup history tour — I've been through CrashPlan, Dropbox, Backblaze (which saved my bacon after my house fire in 2019!), and a mystery one that may or may not have had "Panda" in the name. These days I'm settled on ARQ for personal backups — dead simple, backs up to just about everything (Dropbox, OneDrive, Google Drive, even their own ARQ Cloud for ~$80/year), and all data is encrypted at rest. Not a sponsor, but they should be. The 3-2-1 rule — I actually asked Siri mid-episode, and she initially thought it was a grounding/anxiety technique. (Valid, I guess?) The real answer: three copies, two different media, one offline. I've got a local copy plus OneDrive, Google Drive, and Dropbox — so I think I'm covered. The work side: Proxmox + PBS — My "data center" is a beefy Hetzner Proxmox box with about a dozen VMs. I had Proxmox Backup Server (PBS) set up on a secondary Hetzner box, happily cranking away… until it ran out of disk space and started yelling at me every night. Claude to the rescue — I spun up a Claude project, fed it terminal output and retention configs, and it gave me a straight-up honest assessment: either gut your retention policy (risky) or get more disk. It then walked me through Hetzner's auctions page — which I didn't even know existed — to find a storage-heavy, low-horsepower box. Ended up with two mirrored 8TB drives plus a 14TB drive for around $40/month. Not cheap, but totally worth it as a business expense. The new setup — PBS is now on its own dedicated Hetzner box. VMs from both my data center and my home NUC Proxmox box back up there nightly. Claude also suggested using that 14TB drive as an SFTP target for ARQ, giving me yet another redundant copy of all my personal data. It'll take a few weeks to fully sync, but I'm running some flavor of the 4-3-2-1 rule now (I made that up). Proxmox forever — Someone wrote in asking if I'd go back to ESXi now that Broadcom brought back the free version. Hard no. I've fallen in love with Proxmox and I'm not going back. 7MinSec wiki scripts repo — Head over to 7MinSec.wiki and click the Scripts button to find a new GitHub repo where I'm publishing pentesting scripts. First one up: a push-button Exegol installer. More to come — and I'll probably tease new scripts first over at 7MinSec.club on TuesdayTOOLSday! Have a backup horror story — or a setup you're proud of? Hit us up! And if you need assessments, pentesting, training, or other security goodness, find us at 7MinSec.com.
Introducing Russell Aaron I didn't learn WordPress at a fancy college or career academy. I graduated from the University of YouTube. My internship was the Las Vegas WordPress Meetup and WordCamp Vegas. The rest I learned building mortgage company platforms, working for casinos, inside managed WordPress hosts, and at some of the best WordPress development and support shops on the planet. Show Notes For more on Russell, check out his website: https://russellenvy.com Transcript: Topher DeRosia: All right. Here we go. Hey folks. Russell Aaron: And three, two, one. Topher DeRosia: Hey folks. Welcome to Hallway Chats. I’m Topher, and I’m here with Russell Aaron. I assume I pronounced that right, because it’s not that hard, but you never know. Russell Aaron: You know, so many people call me Aaron. They’ll tag me and they go, “Thanks, Aaron.” And I’m like, “You know, it’s Russell, but it’s cool.” Topher DeRosia: Yeah, nice. All right. Well, I saw a post on LinkedIn the other day from you talking about podcasts having the same people on episodes all the time. I thought, “Oh, I gotta have that guy on my podcast.” Because then you can’t go on any other ever again, because then you’ll be that guy. Russell Aaron: Maybe. Topher DeRosia: So, I snooped a little. You live much closer to me than I expected. Have we met? Did we meet at a WordCamp? Russell Aaron: I think we met at WordCamp Ann Arbor one year. Topher DeRosia: Oh, okay. I went to a whole bunch of those. Russell Aaron: Yeah. I think I spoke 2018, something like that. Topher DeRosia: Yeah. I was probably there. Russell Aaron: Yeah. Topher DeRosia: All right. So tell me where you live, what you do, all that kind of stuff. Russell Aaron: I currently reside in Indianapolis, Indiana, and I am just freelancing as of right now. You know, I live in a pretty small town where it’s kind of old school WordPress, if you will. Anyone who is worth their salt keys will remember a day when websites were not responsive or a business has a cousin of a friend of a brother who builds websites and, “Hey, he’s working on it,” and three years later, there’s still no new website. I kind of live in a town where I’m kind of getting back to my grassroots, where I stay up late at night with my insomnia, and I will roll up to a business and I will say, “Your new website can look like this today. If you pay me this much money, I will install it today, and this is your new website.” And it’s got your updated menu, and it’s responsive, and it works on mobile, and we can connect it to AppPresser and make it an app and stuff like that. So I’m kind of reliving the glory days of what I remember WordPress to be. Topher DeRosia: I’m also freelancing right now, sort of by choice, sort of not by choice. Somebody I’m married to would rather I had regular pay and insurance. Russell Aaron: Heard that. Topher DeRosia: Are you in the same boat, or did you do this on purpose? Russell Aaron: I did this on purpose. I was not working for the man, but I was working with some people. I’m over the tiny little granular things that somebody can fire you over. Like they’re watching if your mouse moves or they’re watching if you haven’t logged in. There’s just no more trust, I feel like, in so many cases. And so I know that I can do things better on my own, and I’m going to. Topher DeRosia: I have to admit, I love the freelance life. It is pretty special. Russell Aaron: Right. It’s almost like… what’s that movie? The 40-Year-Old Virgin, where they are making a website and they’re like, “Hey, Spider-Man 3’s on in five minutes. Let’s go watch it.” Like they totally ignore their job and they just go watch this movie now. It’s kind of like that. Topher DeRosia: Yeah. Yeah. For me, it’s doing stuff with my wife. She has a day job, but it has kind of chaotic hours and not specific days of the week. And so I work when she does, which sometimes is Saturday and Sunday, and then I just don’t on Tuesday and Thursday. That’s pretty great. Russell Aaron: I’m kind of in the same boat. My wife has a wonderful job, and she is with a great group, and she does global advocacy. I mean, she just deals with people that are happy with the product, and she keeps them happy. She does lots of stuff like that. I’m kind of the same thing, where their company is now starting to get into AI, and they have so many questions, and I’m over here building things with AI and doing things like that. So I’m not exactly consulting, but my ideas are going into their company through my wife. Topher DeRosia: My wife works at a grocery store, and they have a cash machine they use in the back office that runs Linux. Russell Aaron: Oh, wow Topher DeRosia: And the IT guys had to come in and do some work on it, and she saw the screen and she’s like, “Oh, is that Linux?” And I’m like, “Who are you, and what do you know?” Super nerd. So what’s your company name? Do you have one, or is it just WP Pro Support? Russell Aaron: WP Pro Support. Topher DeRosia: WP Pro Support. Okay. Do you concentrate more on support, or do you build more? Russell Aaron: I have been doing support since 2011. I formed my very first support company, and I launched it the same day that Shane Sanderson launched Maintainn. My buddy, who you might know, John Hawkins, I was at the Vegas WordPress Meetup Group, and I had the idea in Vegas WordPress Meetup Group where there’s 70 people sitting right here behind me and they all want help. And I was like, “How do I do this?” So I built my first thing where I gave everybody free-for-life support, and they were my test group, if you will. And they helped me work out my bugs and tickets, and they helped me work out how I actually operate and do stuff like that. Then when I launched it, literally that day, John goes, “Wait, have you seen this?” And we had no idea about each other, but we literally launched them the same day. Fast forward three years down the road, I ended up working for Maintainn when it was owned by WebDevStudios. But everything I’ve done in WordPress has been support, whether I’ve worked for a mortgage company, a casino in Vegas, hosting with Liquid Web, doing stuff with NerdPress or AppPresser. Everything I’ve done is support. That’s really where my passion is because I remember what it’s like being a first timer. I think that there is a huge market potential here of people are always going to be new. I don’t care who you are. There’s always somebody new walking in the door, and there has to be a person who will sit down and say, “Come here, I’ll hold your hand.” And I am that person. I always try to look at WordPress from that lens is if a new person is looking at this today, are they going to be happy? Are they going to be confused? And I go from there. So currently today I’m transitioning away from support as we know it, where you write a ticket and then somebody on the other end is like, “Hey, I fixed your site,” or whatever. And I’m transitioning to a new product that I’m working on. So I’m going to be getting away from traditional support, but I’m still going to be doing things in the support space, if that makes sense. Topher DeRosia: Yeah, that makes sense. When I first got into WordPress, it was 2010, and custom post types were brand new. Russell Aaron: Right? Topher DeRosia: And I was out of my element with WordPress. I did not know what I was doing, but I did know PHP, and no one else knew post types yet. So when it comes to that, I was on an equal footing, and that was my way in. That was my leverage. I made a lot of money in the early days just building custom post types. Russell Aaron: Custom post types and single-posttype.php or whatever. Yeah. Topher DeRosia: So I was a competent PHP guy who didn’t know WordPress. And I feel like we’re in kind of the same transition space right now with AI, where we have tons of competent WordPressers who don’t really know AI yet. I think there’s a great space for that, teaching our friends, teaching everybody we’ve known for 10 years in WordPress. You know what I mean? Russell Aaron: I do. That’s one of the things that I really love about WordPress is that… let’s take the new 7.0 that just came out, I think it re-leveled the playing field. Before this came out, there were people that were ahead of others when it comes to patterns or blocks or the command palette and stuff like that. But now I think with this, we’re back to an even playing field because every… I mean, not exactly. There’s still some people who know AI a lot better than others, but you’re always five minutes ahead of somebody and five minutes behind somebody else. Topher DeRosia: Oh, yeah. Russell Aaron: But I do think that with 7.0, a new level playing field has come out. And now is the time to start learning, or you got to wait until 7.1 comes out where that new level playing field comes out. But that’s what I love about WordPress is that it continues to happen. Like you said, CPTs. I still love CPTs. I think they’re one of my favorite things. I look at all of these features, you know, page builders, another time when the playing field was leveled again. Now you learn page builders and then shortcodes and then this and then that. I think that’s the one gift that WordPress keeps giving is that you might be out of date six months from now, but then 7.1 comes out and you’re caught right back up. Topher DeRosia: Right. Yeah. And while you’re five minutes ahead, you quick do a WordCamp talk. Russell Aaron: Yes. Yeah. Topher DeRosia: For that long, you know more than other people, right? Russell Aaron: At least it’s on video, right? Topher DeRosia: Right. I was an expert for a minute and a half. Russell Aaron: That was my 15 minutes of fame. Topher DeRosia: What is your WordCamp life like these days? When was the last one you went to? Russell Aaron: The last one I went to was in Vegas, 2018. It was at the Plaza Hotel, which I worked at. When John was putting that together, in Vegas we had a wonderful space, and it was called The Innevation Center, and it was at a data facility called Switch. And they donated so much to us, and we are so grateful to them. And then they kind of had a change in their policy where they weren’t doing things, and then they overpriced how much it would cost to hold events and stuff like that. I was working at a hotel, and so we had this giant convention space, if you will. And so because I was able to pull some strings, we got a great, great discount, all food paid for. I mean, all of it. So that was my last WordCamp. The after party was on top of a pool deck, and there was pickleball courts, and there was a pool, and there was an open bar. I mean, it was rad. That was my last one. I have kids now. My kids are seven and eight and so my WordPress travels have slowed. No, I’m sorry. I take it back. WordCamp US last year was my last one, where we went scorched earth. That’s what I call it. I call it WordCamp scorched earth. Topher DeRosia: I was there for that one. I used to go to a lot every year. Go to- Russell Aaron: Five, six? Topher DeRosia: Five and 10. But since COVID, I think maybe just US every year. It’s weird to just go to one. Russell Aaron: It is. And just US, it’s almost like we used to have what I used to call regional events, where I lived in Vegas, I would hit up WordCamp Orange County, then I’d hit up San Diego, then we’d hit up LA, and then we’d make our way up to Portland, and then maybe if San Francisco did one, and then Phoenix. I did all my regional stuff. And then every once in a while I would venture… I mean, I love WordCamp Minneapolis. Love the people up there. Love so much about that event. Used to do that a lot. What’s the one in Ohio that I used to go to? Topher DeRosia: In the teens, there were five in Ohio. And being in Michigan, I used to just cruise down there. Russell Aaron: It’s a three-hour, three-and-a-half-hour drive, huh? Topher DeRosia: Yeah. Russell Aaron: About that. Yeah. Topher DeRosia: At the time, I was working for a company that was paying me to go to WordCamps. I had to make the case for each one, but it was a really simple case for all the Ohio ones because I didn’t need a plane ticket. I just drive over there. It’s like five in Ohio. There was Ann Arbor, there was Detroit, there was Grand Rapids, there was Chicago. I mean, there was almost 10 WordCamps within a three-hour drive of me. Russell Aaron: That’s beautiful. Topher DeRosia: It’s just not there anymore. Russell Aaron: I was very fortunate to work for companies like WebDevStudios, where I could tell them, “Hey, I got into WordCamp Minneapolis. I’m going to speak there.” And because I’m speaking there, they would reimburse me X amount of dollars for something, and then they would sponsor the WordCamp, and then they would make a thing out of it. I mean, I was very fortunate in being able to do that. Then I worked with a really great company called NerdPress, and they are a fantastic group of people that do the same thing. And then I ventured out into different straits, and it was very much different. I’ll say that much. Topher DeRosia: Yeah. Those are good times. Russell Aaron: It’s almost like… the way that I put it is it’s like we all graduated. We all did our four years of college, we all graduated, and now we went to our temp jobs or we went to our internships. Like the band broke up. Topher DeRosia: Yep. Yeah, it is a lot like that. I have seen generations of WordPressers. There was all the crew before 2010 that were downloading zip files and hacking themes to even get them to run. Then there was after 2010, and custom post types were new and stuff. And then there’s the whole Gutenberg generation that never experienced all that crazy theme stuff. Russell Aaron: I mean, you tell people that child themes were so new that people didn’t even grasp the concept of a child theme, and today it’s so baked in. It’s not even something that people think about. It’s just you install this and the child theme, and it’s a thing. But I remember writing those by hand. Topher DeRosia: Yeah. No kidding. Then to a certain extent, not even having child themes anymore because nothing is stored on the file system. Russell Aaron: I love it. I love it. In my very first WordCamp talk in Vegas 2012, I made a prediction that everything was powered by the theme. Everything used to… I mean, that’s as far as I go back is every template was the same. It was left column, right sidebar, header, and every page, whether you liked it or not, looked like a blog post. And it wasn’t full-width, responsive. I remember a lot of that. And then corporate themes came out, and then cupcake themes came out, then lawn company themes came out, and then the rise of Envato and stuff like that. That’s a good name for a band, The Rise of Envato. Topher DeRosia: I’d go see them. Russell Aaron: But all that stuff comes out. And then you look at it now and it’s like, that seems so far away. I still remember the day that I learned about child themes, and I’ve never forgotten that. And I think, coming back full circle, that’s why I stay in this beginner support space because I’m kind of keeping that nostalgia around, I guess. Topher DeRosia: Yeah. There’s a lot of joy in watching people’s eyes light up when they get it. Russell Aaron: That’s the best part is just telling people what’s possible. When they’re frustrated with something and you go, “Oh, hey, Gravity Forms can do that.” And they’re like, “Wait, what?” And I’m like, “Yeah.” And they can also do… And I just start naming stuff. And I show all 50 extensions that they have and they’re just like, “Wait, what?” And I’m like, “Yeah.” I’m like, “This starts getting radical when you’re into it.” Topher DeRosia: There’s something I miss from old WordPress that I don’t see in modern WordPress. It might not be a thing. And that is dramatic new styling with a theme the instant you install it. My wife is not a computer person and does not care about computers. She loves design stuff. There was a time we used Winamp. Russell Aaron: Wow. Topher DeRosia: And she loved getting skins for Winamp. And she would download 30 in a day and try them all out. And then when I set her up for the blog the first time and showed her the theme repo on .org, this is in 2011, she would literally spend a day just downloading theme after theme after theme. Russell Aaron: Same way. Topher DeRosia: And you just install it and poof, your site looks amazingly different. These days, I mean, you install something like Kadence or GeneratePress or Ollie or any of them, really, and it’s kind of a blank canvas. Russell Aaron: It’s very minimalist. It’s very minimalist. Topher DeRosia: I miss the ability to say, “I feel like making a change today,” and two minutes later, your site looks completely different because you’re using… Russell Aaron: Couldn’t agree more. Couldn’t agree more. I mean, I look back at old pictures from when I would host the meetup group in Vegas, and there’s pictures of me talking, and then on the screen behind me is my old site, and it was this old layout. I bought the theme from Envato because I was just fascinated with it. It was everything that I wanted it to look like. But same thing is now when you change your theme from this one to that one, that dark grunge kind of thing is gone, and now you’ve got this bootstrap-looking thing or whatever. I agree with you. I think that comes from my days of being in MySpace. That’s how I got started with all this. So you could change your MySpace template like that, and I think that’s where it comes from, at least for me. Topher DeRosia: I haven’t even looked into it. Can you make a Gutenberg-based blog theme that has a very striking look and just release it? And then, I don’t know, just release a whole bunch of them like in the old days? Theme shops had 35 themes for sale, and they all looked different because they were all totally different themes. Russell Aaron: I remember there was a day on Envato where it was the same theme, it was just rebranded. So it was like theme name 1.0, and it was called Atlas. And then it’s the same theme but in orange, and now it’s 1.2, and it’s called Dungeon or something. And then we have 1.3 again. Same theme, same framework, but each version was named something different. It made that developer look like they had five different products instead of just one over and over. Now you look at something like a page builder, and it’s like, “We’ve got 500 different templates in one thing.” I can’t do that. I think that’s too much for me. Topher DeRosia: It’s like the days of the CSS Zen Garden. Russell Aaron: Right. Topher DeRosia: HTML is the same, CSS changes. Before I used WordPress, I built my own blog system. Russell Aaron: Oh, wow. Topher DeRosia: It never got super advanced, but I used it for 10 years. One of the things you can do in your HTML is register alternate stylesheets. It’s the same tag, it’s just an alternate word in there. And then in Firefox, at least, you can go under “view Page Style”, and they would all be listed there, and you can just choose different themes. I figured out the JavaScript, even though I didn’t know JavaScript. I figured out the JavaScript to make a little dropdown box in my sidebar so my visitors could say, “Oh, I want to change my theme here.” I never figured out how to do that in WordPress because everything was so tied to style.css. I didn’t know how to make a different one be the main one. But that’s something else I miss in WordPress is the ability to just so dramatically and dynamically change your design because your content is structured so well. Russell Aaron: You know, not only that, but I really liked the websites where there was a demo, and then it gave you a basic username. The username was demo, the password was demo. But then the one thing I never figured out was how every 24 hours the site would just reset. So somebody can go in there and they could do whatever they wanted to do. They could create their own pages. They could create their own blog posts. And for 24 hours, there was a page called Russell’s Awesome. But then after 24 hours, it would just reset. I always thought that was so cool, but I could never figure out how to do that. Topher DeRosia: Oh, yeah. And everybody was editing all at the same time, within that 24-hour period. Russell Aaron: I have since restructured my website. I use the block theme from WebDevStudios. I kind of feel like that’s where I got my education from. I was somebody who kind of dabbled around in WordPress, and then when I went to go work with them for three years, they had a set of standards that I couldn’t even fathom to begin with. But then as we built things and I saw how their machine works, how their business revolves, I was like, “You know, for me, this is the way that I like to do things, is the way that they like to do things.” And so my new website… I mean, not new website, but it’s my new theme, I actually had AI build it for me. I had Claude. I was using… It’s by ThemeIsle. Neve. I was using Neve, one of my favorite themes. Love them. So I was using that, and then my site was kind of all over the place. It was an “I’ll teach you how to do this”. That’s kind of the main focus of my site is I will jump on a call with you, and whatever questions you have, I’ll sit here for five hours with you if you want. I will teach you and until you get it. But then I also had this section about band names that were just… earlier when we were talking about the rise of Envato, you know, like I would have a section on my blog where you could create a new band name and then I had all these random blog posts. And so my website was kind of like this potluck, if you will, just like this random stuff. And I was like, you know, I want to be doing something else. I think my website needs to change. And I have those old blog posts still, but they’re hidden. So now with my new theme, I had AI look at my old site and say, this is what I think we should do. I picked out some colors and over like five days, I had it build me five different HTML pages, like completely different, you know? And then I started giving AI and I said like, “Okay, I want to look like this.” And then I was like, well, okay, I like this and I like this, but I also like this from this other site.” So I started feeding it information and like when the HTML came out, I had 12 different templates. I had my blog posts, I had my archive, but I had everything built in HTML. And the cool thing about the WDS block theme is that it serves everything as an HTML page. So I literally just took AI and said, “Take these HTML pages, bake them into how this theme does it,” and bam, my site came up. I had it done in maybe two days. Topher DeRosia: Wow. Russell Aaron: And then after that, I had it take all of those HTML pages and create me patterns. So now I can go in, and when I go into my full site editor, I can go to patterns, I have all my homepage patterns, my blog patterns, I sliced everything up, and they’re all WordPress native blocks. So I can literally go in and change the coloring on any page I want instead of having to edit the HTML or anything. And now that I have that, I feel this sense of freedom where I’m not worrying about an update coming tomorrow, if my update is gonna break or I don’t have to read a changelog that is not specific anymore. I can’t stress how much I love not having to read changelogs or the lack of changelogs. I mean, I’m fully happy with how things have come out. And over time, I’m gonna keep fine-tuning it, but I’m pretty much where I’m at right now. With all of this new technology that’s come out, I’ve really kind of found my love again for WordPress. I was kind of in a slump where I just wasn’t really doing anything. Now I take my son and we’ll drive down to Louisville, Kentucky. He rides BMX. So while he’s racing, I will literally have Claude Code open on my computer and I will log into the Claude app on my phone and I can keep sitting there having the same conversation. So this new thing that I’m building, I can still do it while I’m sitting there watching him race or while I’m doing something else. I was just like, this is fantastic. And then my wife will drive home and I’ll just sit there and I talk into my phone, I literally put the microphone on and I’ll be like, “You know, I don’t like that. And here’s my thoughts about this.” And you know, my phone dictates all of that and then I send it to my computer through the app and it just keeps spinning things up. Then by the time I get home, I have a new version that I can demo or I have a new version that I can test. I mean, I am just so fascinated by it. Topher DeRosia: That’s cool. Were we at WebDev at the same time? Russel Aaron: I don’t think so. Topher DeRosia: I was there just over three years ago. Russel Aaron: I was there 2015 through 2018. Topher DeRosia: Oh, yeah. I came much later. I was only there for like two months. Russell Aaron: Oh, wow. Sometimes that’s the way it goes. Topher DeRosia: Yeah. They were gonna get a big contract that hired a bunch of people and two months later didn’t get the contract and let us all go. Russell Aaron: As much as I hate that, that also taught me that the people that do great work or the people that show up every day and are putting in more than they’re getting out, those are usually the people that stay in companies like that. That really changed my work ethic. I used to be somebody who wanted to be not lazy, but I didn’t wanna be pressed for time or having to go, go, go and having to be on all the time. Now, I’m the opposite. Now, I’m like, now that I’ve done that, I kind of earn for that stretch for a little bit. I mean, you were just saying that how you’ve transitioned to where you are. I was watching a Barstool Sports interview with a guy who runs a pizza shop in… it’s either New Jersey or New York. The guy’s only open Wednesday, Thursday, Friday, Saturday. And he’s only open nine to six or something like that. And he built that business… well, it’s been in his family for like 60 years or something. He has one of the last original pizza ovens ever. But anyways, the point is, is that he lives at the pizza place, that’s where his entire life is, but he built the business around his life. I’m doing the same thing where if I wanna literally go jump on my bike right now and go for a two-mile ride, I’m gonna go do that. And I don’t have to feel like, hey, you’re not logged in and we’re not tracking your mouse. Like what’s happening? How come you’re not on Slack? You know what I mean? I’m not tied down to that. And I can’t stress that enough of like, that is where I wanna be. Topher DeRosia: Yeah. Yeah, it is a good life. We are at about the time to wrap it up. Okay. So I’m gonna do that. Where do you hang out online? Russel Aaron: Where do I hang out online? Topher DeRosia: Are you in any common WordPress Slacks? Russel Aaron: I’m on the main WordPress Slack sometimes. I tend to watch more than I do involve anymore. A long time ago, I used to be very vocal and I used to be not afraid to walk in to a room guns blazing. With the big cultural shift that happened in WordPress, I tend to just sit back now and be more self-reserved. So I post on my website, russellenvy.com. I’m on LinkedIn. I’ve been utilizing Reddit a lot too. I think for me, Reddit is a place where I kind of disagree with the fact that you can hide behind a pseudonym, but I do like the brutal honesty that people will have because they are hiding behind something and they will say, dude, this flat out sucks. Or they’ll be like, Hey, this is great, but it would be cool if, or somebody can be like, “Hey, that already exists. You’re not doing anything new.” I do like that. Because it kind of not puts me in my place, but it shows me either how connected or disconnected I am to what I think I’m doing. And so Reddit is a very great place. I mean, everything is russellenvy.com except for Twitter or X, whatever you want to call it. Topher DeRosia: All right, cool. Russel Aaron: Where do you hang out at? Topher DeRosia: I am in probably 40 slacks, but the vast majority of them, I don’t look at. I’m there so that someone can ping me. I’m in a couple of slacks in India. Okay. I’m in the WordPress Italian community Slack. Russel Aaron: That’s interesting. Topher DeRosia: Post status make, of course there’s a hero press Slack. I have my own company Slack, my local meetup has a Slack. There’s just a lot of them. I wouldn’t say I’m super active on any of them. I just occasionally interact with somebody. I use my own company Slack to invite my clients in when we talk there. Russel Aaron: Right. Do you find yourself reading things more than, you know… from the outsider looking in, I post a lot and it looks like I post a lot… I mean, especially on LinkedIn, but I’m always consuming more than I’m posting. Do you find yourself doing that? Like where you’re… maybe not keeping up with the trades anymore, but like, you know… I used to read maybe 1,500 blog posts a week and then… what was that service where you could like save…? I used to have a service where you could save articles and then that way, late at night, I would just read, you know, maybe 10 or 15 of them a night. But now I look at things like Reddit where I see… I just look at somebody who’s going on there and asking for help. Again, it’s a standard WordPress person that, hey, I’m new to this, I don’t know how, and I’m looking at it and I’m just like, how can we make that better? That’s kind of where I’m at these days. Topher DeRosia: I don’t read a whole lot in Slack. It really is for my convenience. I’m pretty active with my RSS reader. I follow a lot of stuff. Russell Aaron: Oh, wow. Topher DeRosia: Because I don’t wanna go chase it all down all over the internet. So, you know, there’s that. I’m on LinkedIn a fair amount, Facebook a little bit. I’m on Mastodon and Blue Sky mostly just to post stuff. It’s funny, I have more followers… No, let me say it this way. Mastodon, I have the fewest followers, but the most engagement from those followers. Russell Aaron: Isn’t that interesting? Topher DeRosia: Yeah, I’ll post something and I’ll get some favorites or reposts or whatever. Blue Sky, I get almost nothing at all, despite the fact that I have like a thousand followers there. Russell Aaron: But Blue Sky is a community that is fast-moving. I almost compare it to anything Meta has, which is you can post today right now and in three minutes you’re 785 posts down. That’s what I really love about Reddit is that I posted something about this AI team that I’m building that I give away for free on GitHub, and so for like five days, I was the number two post on that subreddit. And the volume that I saw from that. I mean, Reddit really loves human writing. If you go in there, you post something that somewhat seemingly might suggest that you had AI do anything with it, they will just downvote it. But if you write original and you write from the heart and stuff, like your stuff skyrockets there. I’ve learned a lot from Reddit because of that. Topher DeRosia: That’s really cool. Russell Aaron: It’s interesting. Topher DeRosia: Yeah. All right, well, thanks for chatting with me. Russell Aaron: Thank you for the time. Topher DeRosia: And now you can’t be on anybody else’s podcast. Russell Aaron: I’m actually starting my own, sir. Topher DeRosia: Are you? All right. Russell Aaron: I have, like you said, the reason why we started this is because you saw something from me that says, “I’m tired of the indie circuit,” if you will. I put out a LinkedIn post, I don’t know, maybe a month ago at this point and I asked people if they wanted to be on a show. So I have WP Roundtable. I got that from Kyle Mahler, a person who I love in WordPress more than I can express. One of the best people on the planet, I feel like. I was thinking about starting that up again, because we don’t have WP Watercooler anymore. We don’t have anything like that. That’s kind of where I got my start from. But again, I also identify that that’s kind of the problem is that every Monday or Friday I was on a show and I was one of the people that you would see constantly. And so I was sitting there thinking and I was like, what doesn’t the space have? What kind of show do I wanna watch? Because I don’t watch shows when they come out, do you? Topher DeRosia: No. Russell Aaron: I always watch them maybe four weeks down the road at like 2:30 in the morning when I have nothing going on. And by that point, the information is almost stale. I mean, the way that anything works these days. And there’s a few that I might watch maybe within 48 hours of coming out, but at this point, there is something… a new idea that myself and… the guy’s actually an automatician. And so it’s actually kind of interesting because we don’t wanna say anything that would put him in a position to where he’s saying something bad about the company he works for, but I’m also the person where I get to say something to the person who works at Automattic to maybe incite some change. So we are working on something like that, but it’s not going to be an interview show. It is not going to be something where you tune it out or you put it on a 2.5 playback speed just to get through it. You know what I mean? And that’s really what the emphasis of my post was about is that so many of the interviews go that way. Topher DeRosia: Yeah. Are you familiar with wppodcasts.com? Russell Aaron: Yes. Topher DeRosia: Okay, good. So when you get it started up, submit it there. Russell Aaron: That’s a place. I’m very fascinated by Gary Vaynerchuk. Are you familiar with Gary V? Topher DeRosia: No. Russell Aaron: I watch something Gary V every day. That guy makes me feel like I’m lazy every single day, but he is also one of the people that says like, “Hey, you’re 40, you’re still just a baby.” A lot of people feel like I should be two kids, a house, marriage, this, that, and because I’m not, I’m behind the ball. And he’s one person that’s like, “Listen, you’re still a kid.” And he’s like, “You’re 40, I’m 40, and you have 10 years until you’re 50.” And even then you’re still so young to where you can generate something again and from 50 to 60, you can now do. That kind of mentality really moved me around. Why I bring that up is, I’m trying not to post on the same places that everybody else is. I wanna find that new venture. Substack is a great one. And they also have a way to release podcast episodes through them. So they can actually be your entire engine. So like you don’t have to host them on different places and stuff like that. So I’m looking for different plays like that. Topher DeRosia: All right, cool. Well, I look forward to hearing about it when it comes out. I’m sure you’ll post on LinkedIn. Russell Aaron: Yes, yeah. Topher DeRosia: All right. All right then, well, I will maybe find you on Slack or Reddit or someplace. Russell Aaron: Slack, Reddit, LinkedIn. Either way, please keep in touch. First of all, it’s great to see somebody familiar in the space. It’s great. I mean, just talking about the old days, I could sit here and do it forever. Topher DeRosia: All right, I’ll see ya. Russell Aaron: Have a good one. Topher DeRosia: All right, so that was the end of the podcast. If you could send me a headshot. And yep, that’s the one. Cool. And any links you want in the liner notes. Russell Aaron: Cool. Topher DeRosia: And two or three sentences about you and what you do and whatnot. Russell Aaron: Cool. I noticed that you… are you trying to revive Hallway Chats? Or is it something that when you just find something interesting, you’re like, hey, I’ll go do that. Topher DeRosia: That’s it right there. Russell Aaron: Okay. Sure, sure. Topher DeRosia: There was a time when it was a weekly podcast and now it’s a whenever I feel like it podcast. Russell Aaron: I love it. I think that’s the biggest reason why I’m trying to do something different is I really dislike watching a podcast. The first thing they do is they come on and they go, “Hey, welcome to WP whatever. Hey, sorry we didn’t post this week. I was bit…” If you are gonna say you’re gonna post every Wednesday at one, that’s on you. But I do not like when things start off with an apology. Like just get to it. Because I’m not watching it Wednesday at one. I mean, unless you’re Joe Rogan, or unless you are somebody who has a huge following that people will watch you live because it’s important. Otherwise, it’s just consumable stuff, you know? Topher DeRosia: Yeah. For years, I posted it Heropress weekly on Wednesday without fail. I would ignore my family to go get it done. Then I was talking to Morton Rand Hendrickson. You know him? Russell Aaron: Uh-huh. Topher DeRosia: Yeah, he’s a huge fan of Heropress. And I said to him, “Do you read every week?” He’s like, “Oh no, not at all.” He’s like, “Oh, I thought you really liked it.” And he said, “Oh, I love it. But I don’t have time to read every week.” Every few months I’ll get depressed about the WordPress community and I’ll go read 10 essays. And then one time I was at WordCamp Ann Arbor, probably the same one you were at and Josepha came to me and said that… she was kind of a sounding board for employees that come to her and said, “Listen, I’ve been working support all day and people suck and I’m depressed and I hate life.” And she would just listen for a while and then at the end they would say, “Okay, I’m gonna go read a bunch of Heropress and I’ll feel better.” And it really changed my perspective of what I was making. I wasn’t making a weekly publication. I was making an archive, a collection to be used as a tool, a library. Russell Aaron: I’m gonna say this poorly, but it’s almost like you are creating a support help hotline where it’s like, if you’re on the verge of blowing up your website, please call this number. We’ll talk you down from it. It’s almost like you’re building that. Topher DeRosia: That’s funny. Russell Aaron: That’s interesting. And then now you’re just selective about it or you’re so far- Topher DeRosia: I’m less aggressive about finding essayists and less insistent that they get it to me by a certain time. Like I would find somebody and say, listen, I need it by Sunday on this date. And they were like, “Okay.” And that worked for a while. Russell Aaron: Oh, before, before. Okay. Topher DeRosia: Yeah. But now I’ll find somebody… No, I don’t go looking as often. Russell Aaron: You’ll maybe find something that somebody wrote and you’ll be like, “Hey, are you interested in doing this?” Topher DeRosia: Yes. And I don’t find people as often. I used to find my people on Twitter and I’m not on there anymore. Russell Aaron: Like by personal choice? Topher DeRosia: Yeah. Russell Aaron: Okay. Topher DeRosia: I just left Twitter. Russell Aaron: Oh, wow. You feel like your life improved? Topher DeRosia: Yes and no. Russell Aaron: Okay. Topher DeRosia: I feel the loss of what Twitter was. And it’s not there anymore. It’s just gone. Russell Aaron: Especially around WordCamp and stuff like that. That used to have to be the place that you’d be on, you know? Topher DeRosia: The Twitter I loved doesn’t exist anymore. And so, yeah, I feel that loss. Russell Aaron: I need a t-shirt that says that. Topher DeRosia: Yeah. Wow. I’m in the process of making a printable store. Printable? Printful. Printful store. Russell Aaron: Cool. Topher DeRosia: With Woo, to make a video with. I need to make a bunch of products. Maybe I’ll make one of those. Russell Aaron: It’s interesting. Wow. You just flat-out left X. Do you feel like with Heropress, it was… and again, this is why I made that post, is that people almost see it like they can make the rounds. And it’s like, well, I haven’t gone there yet. And so they’re gonna submit something to you because they’re gonna get some press out of it. And it’s not so much what’s best for your brand or it’s not best for your website. They just see it as, well, I’m gonna get some exposure there. Do you feel like it used to be that? Topher DeRosia: No. I’ve gotten maybe two or three submissions ever like that. And a couple of them, I was able to say, “No, that’s not what we’re about. It’s this other thing, what Heropress is actually about.” And they’re like, “Oh, well, okay, that’d be great.” And they do that. And maybe one or two people have said, “I built this great company and everyone should come use my company.” Like, no, not so much. Russell Aaron: Interesting. Topher DeRosia: And that’s the end of it. Russell Aaron: I remember back in, I wanna say like 2013, people used to call each other out and be like, why are you giving the same speech at WordCamp Miami, WordCamp Minneapolis, WordCamp San Diego. And that’s kind of where I was at with that same LinkedIn post. It’s like, I really, really enjoy watching Matt Cromwell’s show, but the guy that he just had on also was on Jonathan Denwood and was also on this one. It was also on, I was like, I’ve already seen this. Maybe I get three more percent information that wasn’t in that last, or because Matt knows a little bit more about personal stuff in WordPress or building a business, he might have some more insight there, but it’s like, I’ve already heard this and I’m kind of already over it. And that’s kind of where I was at is you don’t have to just say, I’m gonna do this one and that’s it. But it’s almost like, you’re making yourself not… what’s the word. Not credible because you’re going around and saying the same thing and it’s just, you’re not doing anything different than a blog post could have done. Topher DeRosia: You know what I mean? I don’t feel too bad about repeating WordCamp talks because, especially at small camps, because a lot of people are just gonna go to their local camp and never go to another one. And unless they cruise.tv, they’re not gonna see it. I struggle a little bit with podcasts because I’ve been asked a lot over the last 10 years to come on a podcast and talk about the story of WordPress. And it’s the same story every time, you know? And so, I’ll try to mix it up a little bit, give different information that I’ve never given before, that sort of thing. But it is something I think about and struggle with a little bit. Russell Aaron: What do you struggle with about it? Topher DeRosia: I don’t wanna just say the same thing over and over again. You know, I don’t want people to go, oh, Topher’s on another podcast episode. Oh, I’ve heard this story. I don’t need to be on this episode. Fortunately, it’s been around long enough that I can give a brief synopsis of the beginning and talk about stuff that’s happened in the last couple of years. Russell Aaron: Right. Topher DeRosia: Which is gonna be really different from the podcast episode I was on in 2020. Russell Aaron: You know? Right. Topher DeRosia: It’s an interesting dilemma when you have one story to tell and everybody wants you to tell it. How do you deal with that? Russell Aaron: Well, I’ve noticed that too. It is like, you know, I’ll watch [Insert Famous Name Here], and they have a podcast, and they’re interviewing, again, [Insert Famous Name Here], and that person was also just on That Famous Name and That Famous Name. I actually saw somebody, it’s like almost a year ago, and they were just like, “Do you want me just to say this so your show has this speech in it or are you genuinely asking me?” Because, you know, like you want this story so you can post it on your social media. But I’ve already given that story 15 different times because they wanted it for their own, you know? And it’s almost going that way where I kind of respect it in a way because you don’t want to post other people’s content. But I also feel like I’m tired of saying the same shit over and over again. It’s interesting, man. Topher DeRosia: Yeah, that’s a dilemma. Russell Aaron: So you’re just like kicking back and… are you building something for you that you think is gonna scale or are you trying to get away from WordPress? That’s kind of where I’m at right now. Topher DeRosia: Yes and no. I have always wanted to… I’ve always been better with people than code. I’m a life coach. Russell Aaron: Yeah. I did not know that about you. Topher DeRosia: I love talking to the client more than coding. I love helping people learn things. And so those skills could be anywhere in WordPress, but also could be anywhere outside of WordPress. So I’m looking for those jobs and they are not out there. Russell Aaron: Right. Topher DeRosia: So here we are. Russell Aaron: I’m to the point now where my son, he’s eight, but he races BMX, like actual bikes and stuff. And so there’s a college here in Indianapolis and it’s one of the best cycling schools in the country. And there’s like five Olympians that practice every Tuesday and Thursday and they’re right in our back door. These are people that have a great social following, but they don’t post very well. They have a brand name, but they don’t have a website. So I’m noticing that every new space that I go into, it’s kind of like I get to jump back into WordPress again, where it’s like, hey, I just built a website for this BMX track in Louisville, Kentucky. It’s one of the best tracks in the country by everybody that has ever raced in a sport, they all vote that it’s one of the best, but they don’t have a website period. I just went through this where they have a guy, he’s their treasurer and he’s like, “Well, I’m an AI software guy.” And I’m like, “Well, how come you don’t have a website?” And he’s like, “Well…” And I’m like, “Listen, I submitted a new version of a we… literally, I uploaded it to my Russell website or to my Russell Envy site and I just put it in a sub-folder and I was like, “Your website could look like this today.” I was like, “For free. I don’t want anything from you. No free anything.” I was like, “I want to donate this to you because I want to grow the sport.” And the guy’s like, “I wanted to build it and React.” And I’m like, “Well, why didn’t you?” And the guy’s like, “Uh.” And I’m like, “I have free hosting for life from WPEngine.” And I was like, “I won’t charge you guys ever. I will host a site. I have free with AppPresser. I’ll build you guys an app where you guys can send push notifications.” And the guy’s like, “Well, I want to have a lot of control and say over it.” And I was just like, “All right, you know what?” And then I built my own. Now I own a domain all about their BMX track and now they’re calling me going, “We should have went with you.” I’m to the point now where I’m nice. And then it’s just like, “Dude, I’m 10,000 miles over you and I’m going to go this way.” Liquid Web did that to me. Liquid Web brought me in and they were like, “We’re going to…” I was supposed to be the OG stellar WP. They brought me in, I was hiring all my friends and I was bringing in people and we were building something. And then they called me and they were like, “Well, you can either be a level two support person or you could just not work here.” And I was like, “Well, I don’t work here anymore.” And they were like, “Well, wait, hang on.” And I literally hit “click” and I have never logged on since. Topher DeRosia: That’s funny. Russell Aaron: I’m in that same boat where, you know, I don’t have to work for you. You know what I mean? Like, fuck, I’m 40. I should be doing something on my own anyway. I kind of wish I had… what was WP 101? Sean did that for all those years. I wish I would have done that. Or every week, I should have had some YouTube about talking about something and maybe I could have monetized that, but I’m not behind the ball. I let the ball slip is what I feel like. Topher DeRosia: It’s not too late to start. I picked that up when Sean, quit and I’ve got a YouTube channel with a bunch of stuff on it. I published one today. Russell Aaron: Oh wow. It’s just interesting things that you think about, or is it like educational, like tutorials? Topher DeRosia: It’s educational tutorials, but stuff that I find interesting. Like today I made a desktop wallpaper for WordCamp Europe. Russell Aaron: Nice. Topher DeRosia: And I did it by going to their webpage in my browser and using the console to hack the HTML and CSS until it looked like a screen, a wallpaper. Russell Aaron: That’s fucking cool. Topher DeRosia: So I published it right before I’d started talking to you, like minutes before that. And it has three views. Russell Aaron: Woohoo. Topher DeRosia: But a couple of weeks ago I did one called fun and games in the terminal. And it’s how to play Tetris in the terminal and how to make a choo-choo train go across your screen when you type LS wrong. And it has 784 views right now. Russell Aaron: That’s awesome. Topher DeRosia: I did one on how to brighten a photo. I did a series. I’m working on a series called Topher learns how, or I talk to people who know how to do things that I really should know how to do, but don’t. I talked to Scott Kingsley Clark about pods, which has been around forever, but I’ve never used. I talked to Donata about Termageddon, because I know it’s important, but I have stayed away because I don’t understand and it’s scary. Russell Aaron: Termageddon. I’ve never heard that. Topher DeRosia: Oh. You know the little cookie consent things, privacy policies and whatnot? Russell Aaron: Yeah. Topher DeRosia: So when you sign up with term again, you pay a surprisingly low monthly fee and they have a human get on the phone with you and talk through your requirements of where you live, your legal stuff. Like, are you in Europe? Are you in California? Where are you? Where are your customers, your viewers? Then you drop in a short code for your privacy code and for the cookies and they keep them up to date based on how the laws change. So you don’t have to pay attention to, Oh, did California make some crazy new law about cookies? What do I need to do to update my site? It’s really, really great. So I did an interview with her. Russell Aaron: $12 a month or $119 a year. Topher DeRosia: Yeah. Russell Aaron: What is the point of having a privacy policy if you don’t pay extra for limiting your liability? Wow. That’s amazing. Topher DeRosia: It is. Russell Aaron: That’s someone just thinking outside the box. Topher DeRosia: Yeah. I have a couple of videos where I was given an account at a hosting company that I’ve never used and videoed logging in for the first time and getting to a website. Russell Aaron: Oh, wow. Just from first login to setting everything up to now you have something production. Wow. Topher DeRosia: Yeah. Specifically not reading the docs. Russell Aaron: Oh, just trying to brute force your way through it. Topher DeRosia: Yeah. Russell Aaron: That’s smart, dude. Topher DeRosia: It’s partly about… well, they may have wonderful docs. It may be super easy to do if you read all the docs. I don’t want to read the docs. Russell Aaron: Me neither. Topher DeRosia: Clickety clickety click, I have a website. So I did GreenGeeks. I did honesthosting.io. I did X cloud. So that’s the kind of stuff I’m doing. Russell Aaron: That’s interesting. That is something that, that Gary V talks about a lot is that it used to have to be where you are this WordPress brand and you do just this and all your videos could only be about that. Anytime you stepped outside the box, people were like, “Why am I watching this?” And today now we’re to finally to where my website would probably actually thrive is it’s so random. It’s just something out of my head and one thing can skyrocket and it’s like hitting the jackpot, you know? That’s interesting. Topher DeRosia: Another thing I did is I made a site called topher.how and because I realized I had never really made stuff in my own channel. I’ve been blogging for decades, making videos, WinningWP. I have over a hundred videos on WinningWP. Russell Aaron: WinningWP? Topher DeRosia: Yeah. Russell Aaron: Did you start that when Charlie Sheen started doing Winning? Topher DeRosia: No, no, no, no. But I was thinking, boy, I’d love to have all this stuff on my own website, but I don’t want to go find it all and copy paste posts. And then I realized nearly every place I’ve ever made content has RSS for their authors. Russell Aaron: Yeah. Topher DeRosia: And so I found the sites, found my author RSS feed and started piping them into WP all import. And now topher.how has all my content from the last 15 years on a dozen different sites, doesn’t more than a dozen different sites, all my videos, all my posts, everything on wordpress.tv, all that stuff. So it’s kind of a portfolio. Yeah, so you can go to topher.how and see all my stuff. Russell Aaron: That was actually one thing that I was really proud of was that my entire WordPress journey is documented on somebody else’s project. So, like you go to WPwatercooler and my resume, what is great about it is that it is not me who can edit those videos, it is not me who can master them. Those words are there. Those words are me. You want to know my qualifications in WordPress, there’s all my shit. For me, I was like, “That’s actually pretty sick. You know what I mean?” Topher DeRosia: Yeah. Russell Aaron: Wow. Topher.how. Oh, dude, do you know who Jeffrey Zinn is? Topher DeRosia: No. Russell Aaron: Oh God. Him and Brandon Dove they have Pixel Jar. Have you ever heard of Pixel Jar? Topher DeRosia: Maybe. Russell Aaron: They’re big West coasters. I’ll tell you that much. He just wrote me, “He literally just said, dude, how do you find the time to write so much on LinkedIn? I enjoy all your stuff, but mostly I’m blown away by the volume.” Topher DeRosia: Nice. Russell Aaron: I’m going to write him back and just tell him the truth. But you know, it’s all thought man. Interesting. Topher, I’ve had a lot of fun. Am I taking up your time? Topher DeRosia: I should get back to work. Russell Aaron: All right, sir. Have a good one. Topher DeRosia: All right. I’ll see ya. Russell Aaron: Bye. Topher DeRosia: Bye.
What if the next great irrigation software tool doesn't come from a manufacturer, a big tech company, or a traditional development team? What if it comes from you? In this episode, Andy shares his personal experience learning the craft of vibe coding and why he believes it could be a game changer for the irrigation industry. After four months of building apps with AI coding tools, including SLIDE and BranchBoard, Andy explains how curiosity, imagination, and domain knowledge can now turn real-world problems into real software faster than ever before. This is not a technical coding tutorial. It is a rally cry for the curious. If you have ever thought, "Why doesn't this exist?" or "I wish this worked differently," this episode is for you. Andy walks through how to start with a pain point, brainshare with AI, create a product requirements document, and use tools like ChatGPT, Codex, GitHub, Visual Studio Code, and AWS to begin building real applications. The message is simple: If you think it, you can build it. The future belongs to the curious.
In this talk, Nikita, Senior Applied Data Scientist at the AWS Generative AI Innovation Center, shares his expertise in bringing enterprise artificial intelligence out of the sandbox—from his early days optimizing traditional machine learning models like gradient boosting to deploying advanced production-grade GenAI pipelines. We explore what it really takes to move generative AI systems from pilot prototypes to production environments.Links:- AWS Generative AI Innovation Center: https://aws.amazon.com/ai/generative-ai/innovation-center/You'll learn about:- Deploying multi-layered defenses independent of backend LLMs.- Evaluating parameter-efficient methods like LoRA and QLoRA for small models.- Balancing long-term domain expertise with real-time documentation retrieval.- Utilizing multi-agent orchestration for search and anomaly explanation.- Setting up robust LLM-as-a-judge frameworks verified by human metrics.- Leveraging Amazon Bedrock components for memory and runtime scalability.TIMECODES:05:52 Shifting from traditional ML to generative AI07:49 Hybrid pipelines blending classical ML and LLMs11:25 Production guardrails and multi-layered system defense16:15 Prompt bypasses, input attacks, and AI red teaming20:49 Newsletter localization and translation with Zalando27:24 Evaluation frameworks and human-in-the-loop metrics33:07 Aligning LLM-as-a-judge with few-shot prompts34:49 Fine-tuning small language models versus prompting41:18 Complementary mechanics of RAG and fine-tuning43:00 Agentic web search tools for anomaly explanation47:01 Automated text generation from real-time sports sensors49:58 AWS project scoping and proof of concept timelines54:58 Interview requirements and career skills for AWS roles57:59 Enterprise architecture patterns and system observability01:00:42 Reusable infrastructure blocks on Amazon BedrockThis session is designed for machine learning engineers, data scientists, and technical product managers looking to architect reliable, production-ready GenAI workflows. It is highly valuable for teams aiming to bridge the gap between experimental AI prototypes and secure enterprise software.Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/ Connect with Nikita- Linkedin - https://www.linkedin.com/in/kozodoi/- Github - https://github.com/kozodoi- Website and blog - https://www.kozodoi.me/
Security misconfiguration is one of the most frequently found vulnerabilities in web application pen testing — and most of the fixes are just a checkbox. In Part 2 of their OWASP Top 10 series, Brad Causey and Jordan Natter cover OWASP A05: Security Misconfiguration with real stories from recent engagements and practical takeaways for developers, security teams, and organizations of all sizes.In this episode:Hardcoded Active Directory credentials and API keys discovered in a public GitHub repo during a healthcare pen testDefault credentials (admin/1234) found on a clinical research app storing PHIA rogue Apache basic auth panel that survived from dev into productionHow verbose error handling and stack traces hand attackers a roadmap to your appWhy dev-to-production is the most dangerous transition in your app's lifecycleThe shift-left mindset and DevSecOps — empowering devs to ship secure codeHow CIS lockdown guides can dramatically improve your security posture overnightResources mentioned:OWASP Top 10: OWASP Top Ten Web Application Security Risks | OWASP FoundationCIS Benchmarks: https://www.cisecurity.org/cis-benchmarksEp. 182 – OWASP Top 10 Part 1: https://youtu.be/BwYJ-kZ3XaYNeed a web application pen test? Reach out: Offensive Security - SecurIT360Blog: https://offsec.blog/Youtube: https://www.youtube.com/@cyberthreatpovTwitter: https://x.com/cyberthreatpovFollow Spencer on social ⬇Spencer's Links: https://spenceralessi.comWork with Us: https://securit360.com | Find vulnerabilities that matter, learn about how we do internal pentesting here.
PHP Podcast – June 4, 2026 Hosts: Eric Van Johnson & John Congdon Another fun episode of the PHP Podcast! Here’s what we covered: PHP Tek 2027 — New Dates, Bold New Format Mark your calendars: PHP Tek 2027 is happening April 27–29 in Chicago, and Eric and John are shaking things up. Rather than a straight three-day PHP conference, next year gets three tracks — two of which are familiar PHP-focused content, and a third specialty track that rotates each day: one day of JavaScript, one day of DevOps, and one day of Laravel. The Laravel track is specifically focused on how developers actually use the framework day-to-day, not a product pitch. Single-day passes will be available, so if you’re only coming for the DevOps or JS day, you’re covered. One important heads-up: there’s a big convention happening at a venue nearby in Rosemont, so the hotel block could sell out faster than usual. When they open reservations, don’t wait. Holly the Elephant Is Going Fast The PHP Architect conference elephant, named Holly, is now available at store.phparch.com, and demand has been remarkable. Eric woke up one morning to a flood of orders and genuinely couldn’t figure out what happened. The warning from last year applies here: people said they’d grab Tony later, and now Tony is gone forever. Holly ships June 17th for most orders, but if you’ve already ordered, it’s likely on its way. Get yours while you can. PHP Tek TV Is Doing Something Different This Year In past years, conference talk videos would get edited and uploaded weeks (or months) after the event. This year, John is doing things differently: the raw, unedited recordings are going up now, with timestamps in the description so you can jump straight to specific talks — some rooms recorded a seven-hour continuous feed and just left it running. The clean edited versions are still coming (a video editor friend in the UK is on it), but if you want to see a talk right now, the raw version is there. Audio quality varies by room, but it’s watchable. Immich — A Self-Hosted Google Photos That Actually Works John has been running Immich, a self-hosted photo management platform, in a Docker container for about a month and loves it. It does facial recognition, GPS tagging, and auto-uploads from his phone — essentially everything he cares about in Google Photos, without handing his photos to Google or Apple. He’s now planning to use it as the PHP Architect conference photo library, centralizing all the Tech photos in one browsable, shareable place. It’s fully open source, with no licensing cost, and an optional donation tier. If you’re sick of paying ever-increasing storage bills to big tech companies, this is worth a look. Ben Ramsey’s PHP Tek Homecoming Article Is Free to Read The May issue of PHP Architect magazine is now available to digital subscribers, and this month’s free article is Ben Ramsey’s piece on the PHP Tek homecoming experience. Eric reached out to Ben last minute and he delivered. If you’ve never subscribed, this is a low-barrier way to see what the magazine is like. Head to phparch.com, grab the free article, and if you like what you see, subscriptions are not expensive. John Is Resurrecting a Legacy Laravel App — With Claude’s Help John has been grinding away on a Laravel 6 app that was a passion project years ago and has now been revived as an actual client project. Using Claude to methodically baby-step through each version upgrade — starting with writing tests to establish a baseline — he’s worked up through the major Laravel versions. The turning point came when he hit the version where the old event sourcing package (Prooph) was clearly on its way out, and the decision was made to migrate to Verbs, Nuno Maduro’s Laravel-native event sourcing package. John’s now looking forward to it. He’s also accidentally been burning tokens on the company Anthropic account (not his personal account), which Eric caught live on air. They are going to talk about it after the show. Eric’s Mystery Side Project Is Almost Ready — If DNS Would Cooperate Eric teased a new side project last week and intended to reveal it this week, but he’s stuck waiting on DNS propagation. The domain was registered with DigitalOcean DNS already in use by a previous owner, so Eric moved it to Cloudflare — only to discover there may be a conflict because the previous owner was also on Cloudflare. The result: the name servers are stuck on old values. John’s live suggestion was to move it to Route 53, and Eric was immediately sold. The project is almost ready to show the world, DNS gods willing. Meta’s AI Support Bot Got Socially Engineered Eric shared a video demonstrating how someone prompt-injected Meta’s AI customer support bot into sending a verification code to an attacker-controlled email address — and then using that code to add the email to an account, enabling a full password reset and account takeover. The irony: Meta is the company behind Llama and has some of the deepest AI expertise on the planet, and they still shipped a support bot with permissions it shouldn’t have. Eric’s point was pointed: you can fire a human employee who gets social engineered, which creates accountability throughout the team. An AI has no such incentive structure. Crowbarring AI into account-modification workflows without appropriate guardrails is just asking for this. The PHP Foundation Now Publishes Board Meeting Minutes Eric discovered that the PHP Foundation has started publishing their board meeting minutes in a public GitHub repository. Nothing earth-shattering yet, but seeing who attended, what was discussed, and what decisions are being made gives the community a real window into how the foundation operates at scale. It also helps explain something Eric and John have always found interesting: why PHP stalled so hard between versions 5 and 7. There was no foundation, no financial backing, just volunteer hours. Now there’s a paid staff and governance structure — and the minutes show exactly how complex running something at PHP’s scale actually is. The PHP Foundation Has a Dedicated Security Team Now Speaking of the Foundation, it now has a dedicated security team — a sign of how seriously the supply chain attack problem has gotten. AI tools are being deployed by black hat actors to find vulnerabilities in open source projects at a scale that wasn’t possible before. PHP is not just another open source project; it underpins a massive slice of the web, and companies depend on it staying secure. Having a team specifically focused on this is the right call, even if it’s a sobering reminder of where the threat landscape is heading. Moat — Nuno’s GitHub Security Auditing Tool Nuno Maduro (of Laravel fame) quietly shipped a tool called Moat that audits your GitHub presence for security gaps. Install it globally via Brew or Composer, point it at your GitHub org, a specific repo, or even a specific branch, and it gives you a report on where your security posture could be improved. It’s read-only — it won’t change anything — and it’s explicit that it is not a security certification. Eric wants to use it to audit the PHP Architect organization’s repos, many of which haven’t been touched in years. Think of it as a fast, opinionated triage tool, not a replacement for a real security audit. Links from the show: PHP Tek 2027 — Chicago, April 27–29 PHP Architect Store — Holly the Elephant Immich — Self-Hosted Photo Management PHP Architect Magazine Verbs — Laravel Event Sourcing by Thunk Moat — GitHub Security Auditing by Nuno Maduro PHP Foundation on GitHub PHP Architect Discord Host: Eric Van Johnson X: @shocm Mastodon: @eric@phparch.social Bluesky: @ericvanjohnson.bsky.social PHPArch.me: @eric John Congdon X: @johncongdon Mastodon: @john@phparch.social Bluesky: @johncongdon.bsky.social PHPArch.me: @john Streams: Youtube Channel Twitch Connect & Hire PHP Architect Website Twitter/X Mastodon Hire PHP Developers Looking to hire PHP developers? Email support@phparch.com – Joe and the team are available for consulting, infrastructure work, Ansible playbooks, and code review. Partner This podcast is made a little better thanks to our partners Displace Infrastructure Management, Simplified Automate Kubernetes deployments across any cloud provider or bare metal with a single command. Deploy, manage, and scale your infrastructure with ease. https://displace.tech/ PHPScore Put Your Technical Debt on Autopay with PHPScore CodeRabbit Cut code review time & bugs in half instantly with CodeRabbit. Music Provided by Epidemic Sound https://www.epidemicsound.com/ Join Us Live Next Week Youtube Channel Got feedback? Join us on Discord at discord.phparch.com The post The PHP Podcast 2026.06.04 appeared first on PHP Architect.
Is it time to build your own agent harness? Carl and Richard talk to Emmz Rendle about her work on Daemonic AI, which gives you more control over which models and tooling you use to build software with agents. Emmz talks about the upcoming rug pull in AI software development tools, where prices are rising, and services are being restricted. Having enough control to choose when to run locally becomes key to being productive at a reasonable price. Being able to pick-and-choose what agents and configurations to use for each of the agent roles you want to implement is super powerful - check out the GitHub project and take it for a spin!
Hannah Hoffmaster went from a self-described two-out-of-seven in technical skill to building multi-agent AI tools in a single year at Foster. This episode is for anyone — technical or not — trying to understand what genuine AI fluency looks like and how to build it. Hannah Hoffmaster is a student completing the one-year MSIS program at the University of Washington Foster School of Business. She came to the program with some knowledge of statistics and R, but little coding experience. Through her coursework — including Prof. Leo Bousioux's AI and Generative AI in Business class — she developed the ability to design and build AI-powered tools, including a charity comparison platform and an ADHD-focused scheduling app. She describes experimenting with AI as something she now does for fun. We covered alot of ground in this episode: How to think about AI as a build tool when you have no coding background Why "trust but verify" is the core discipline of working with AI, and how to operationalize it How to design a multi-agent workflow around the parts of a task you don't want to do What a deliberate, build-first job search looks like in a fast-moving field How to stay current as tools change — by building, researching versions, and talking to peers Why holding your career goals loosely can be an advantage in an uncertain market Resources mentioned: GiveWise (Hannah's project); Offload and the "Nudge" chatbot (Hannah's project); Claude Code; Supabase; GitHub; Vercel; Lovable; ChatGPT; Gemini; Codex; Prof. Leo Bousioux's AI and Generative AI in Business course; Foster's AI club.
The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity's Last Exam, etc. These metrics are useful, but don't always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesses in the real world. One of which is Vending Bench.In Anthropic's Mythos Preview System Card, Andon was the only third party eval to get their own section, observing increasingly concerning aggressive behavior:You don't know what a model is capable of doing in the real world unless you actually give it inventory, a wallet, tools, customers, competitors, humans, & some time. More often than not, it'll surprise you how much a model is capable of and in doing so, also reveal unexpected behavior: deception, context collapse, emergent coordination, & bizarre negotiation behavior.While an inflection point in personal agents came post-OpenClaw after full file access with bypass permissions became the norm, it is yet to come for agents in the real-world. However Andon Market, an actual in person store fully run and managed by AI, is paving the way for what is possible.Full Video PodFrom Claude trying to call the FBI over a $2/day vending machine charge to AI agents forming price cartels, hiring human employees, running physical stores, and writing existential robot musicals, Andon Labs is stress-testing what happens when frontier models stop being chatbots and start acting in the real world. In this episode, Andon Labs cofounders Lukas Petersson and Axel Backlund join swyx and Vibhu to unpack the strange, funny, and genuinely concerning edge cases that emerge when agents run businesses over long horizons.We go deep on Vending-Bench, Project Vend, Vending-Bench Arena, Bengt, Butter-Bench, Luna, and Andon's broader mission of building realistic real-world evals for autonomous AI systems. Lukas and Axel explain why dollar-denominated evals reveal things traditional benchmarks miss, how Claude ended up reporting its vending machine fees as cybercrime, why long context windows can drive agents into meltdown loops, what happens when agents compete with each other, and why the future of AI safety may depend on testing models in messy physical environments instead of clean benchmark sandboxes.We discuss:* Why Andon Labs started with dangerous capability evals and long-running agents* Vending-Bench and why running a vending machine is a deceptively hard AI benchmark* Why money-based evals avoid the saturation problem of traditional benchmarks* How Claude tried to call the FBI over a $2/day fee* Why long-horizon agents can spiral into existential and legalistic breakdowns* Project Vend: putting an AI-run vending machine inside Anthropic* Why real humans are “out of distribution” for simulated agents* Claudius, Seymour Cash, and the chaos of AI CEOs* How a human briefly became CEO of Claudius through a manipulated election* Why multi-agent systems can converge back into “helpful assistant” behavior* Bengt, Andon's internal office agent with email, spending, terminal, phone, camera, and internet access* How Bengt traded Amazon purchases for face-recognition training data* Claude's aggressive behavior, lies, refund avoidance, and price-cartel behavior in Arena* Why eval awareness may become the AI version of “are we living in a simulation?”* Blueprint Bench, spatial intelligence, and why models still misunderstand physical rooms* Butter-Bench and testing LLMs as robot orchestrators* Luna, the AI-run physical store with a three-year lease and human employees* The new Andon cafe in Sweden and why real-world geography matters for agent evals* Rotten tomatoes, perishable goods, and the hidden difficulty of running a physical businessLukas Petersson* LinkedIn: https://www.linkedin.com/in/lukas-petersson-181a83172/* X: https://x.com/lukaspetAxel Backlund* LinkedIn: https://www.linkedin.com/in/axelbacklund* X: https://x.com/axelbacklundAndon Labs* Website: https://andonlabs.com* Vending-Bench: https://andonlabs.com/evals/vending-bench* Andon Vending: https://andonlabs.com/vendingTimestamps00:00:00 Introduction00:01:00 Andon Labs and the Origins of Vending-Bench00:05:21 Why Money-Based Evals Matter00:09:51 Agent Harnesses and Self-Modifying Systems00:13:36 Claude Calls the FBI00:16:33 Project Vend: Claude Runs a Real Vending Machine00:21:44 Seymour Cash, AI CEOs, and Election Chaos00:27:16 Multi-Agent Coordination and Slack Observability00:30:18 When Will Agents Run Real Businesses?00:34:56 Bengt: Andon's Internal Office Agent00:40:06 Real-World AI Safety and Long-Horizon Traces00:44:28 Lying, Refunds, and Price Cartels in Arena00:52:42 Eval Awareness and Simulation Behavior00:56:06 Blueprint Bench, Butter-Bench, and Robotics01:04:37 Luna: The AI-Run Physical Store01:09:29 The Sweden Cafe and Real-World Expansion01:13:16 What Comes Next for Andon LabsTranscriptIntroduction: Andon Labs, Long-Running Agents, and Real-World EvalsSwyx [00:00:00]: Welcome to Lukas and Axel from Andon Labs, and I'm joined by my, favorite guest host. Anything security, safety, alignments, Vibhu., welcome.Lukas [00:00:15]: Thank you for having us.Axel [00:00:16]: Thank you.Swyx [00:00:17]: Let's match names to voices., maybe you wanna take turns introducing yourselves.Lukas [00:00:21]: I'm Lukas.Axel [00:00:22]: And I'm Axel.Swyx [00:00:24]: Let's introduce Andon Labs a bit. How did you guys come together?, you have different backgrounds, but you're both Swedish., was that, a big part of it?Lukas [00:00:33]: So when I went to high school, there was this really cool guy who had a superpower. He could code. So he made like the or like the app for the, for the school and stuff, and he was super cool, and I wanted to be like him, and that was that guy.Axel [00:00:47]: I don't know about this.Swyx [00:00:49]: But you went to different universities, right?Lukas [00:00:51]: But same high school.Swyx [00:00:52]: I see.Lukas [00:00:52]: So we always said, “Oh, once we graduate university, then we should start a company,” and that's what we did.Swyx [00:00:58]: Wow, there you go. And about a year ago, you kinda burst onto the scene with Vending Bench, but, was there a thing before that was, kind of like the inception?From Dangerous Capability Evals to Vending BenchAxel [00:01:07]: So we did work, yeah, with, Anthropic was one of our, early customers in doing, evals. So we did, dangerous capability evals., nothing we published openly. But then we started thinking about doing some kind of, public benchmark, and one thing that we really started thinking about, was like running agents and specifically agents managing businesses., ‘cause-- and this was, early 2025., and I think the first, mentions of people will be running, person unicorns or even autonomous companies. So we thought, “Let's make a benchmark of how well can an agent run the probably simplest business, possible,” and, that's probably, running a vending machine. So that's the first public one we did. And it was very, like-- there was almost no one that noticed it in the first couple of months, I think., so we released it in February last year, and then I think around Easter last year, we got, the first viral tweet about it, that someone else did.Lukas [00:02:11]: We tweeted a bunch, uh When it came out and, tried our best.Axel [00:02:15]: We tried.Vibhu [00:02:16]: It's the one at Anthropic, right?Lukas [00:02:18]: So thisSwyx [00:02:19]: This is a classic thing we should get out of the way.Lukas [00:02:20]: Exactly. There's two versions.Swyx [00:02:22]: Everyone does this. Yes.Lukas [00:02:23]: There's Vending Bench, which is the simulated one, which we did, completely independently in February., and then, like Axel said, that was like-- That was the thing that didn't get any traction in the beginning, but then some random person made a tweet about it, and thatAxel [00:02:38]: You have the paperLukas [00:02:38]: That is the paper. Correct, yeah., and then since we thought this was very fun, we thought, oh, I think this is also, one thing with Andon Labs, the way we kind of like decide what to do next and what projects to do, it's what is like the heuristic we use is what is fun? Is What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So, then we basically had this idea, and then we, like-- But then we needed a place for it and, putting it out in the public would probably not really work., would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic, and they were “Yeah, you can have space. This sounds fun.” UmSwyx [00:03:21]: It's like a small fridge, right? It's like a mini fridge.Axel [00:03:23]: Absolutely.Swyx [00:03:24]: People-- There's like a stripe thing or like anVibhu [00:03:27]: Oh, okay. So it was very OG, the early daysLukas [00:03:28]: That's the OG one. YeahVibhu [00:03:29]: IPad on this. We saw it in June, like two months after After it had been there. They upgraded a little bit. There's a security camera for making sure you actually Venmo the thing.Swyx [00:03:40]: So, my impression, okay, we're, we're going straight into project Ven because it's such a iconic thing. I do want to cover a little bit of that, the origin story even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in future of AI, interested in developing evals. But how the hell do you just, walk into Anthropic's doors and, work with them, right? What is What are they looking for? What works? And then maybe, when you launch, I always think, obviously it would be better to launch with a lab, but, sometimesVibhu [00:04:12]: It's harder to do than it seems.Swyx [00:04:13]: Exactly. So either of those, which are more sort of newbie beginner questions, but, I think it's meaningful advice to others.Lukas [00:04:21]: We get this question a lot, and I don't think our experience is maybe the best., but, the way we did it was that we just built a bunch of things that we had conviction would be useful, and then we just, set up a server and sent it to them for free to use. And then after a while they were “Oh, yeah, this is actually kind of useful. We should probably pay for this.”, but that took a while. I don't know if this is, the best path to doing it, but that's how it went for us.Axel [00:04:47]: I think maybe generally, building-- everyone is interested in good evals, and especially evals that, don't saturate that easily. So, if you can build an eval that, tests something novel, something useful, and you have, good separation of models, like your, the more advanced models rank higher than the worst models, and then you can, yeah, you can, publish it and, try to get some traction, sort of how Vending Bench got attention., and then probably some lab will be interested or you can at least have something to reach out with, when you're doing that.Why Dollar-Based Evals MatterSwyx [00:05:21]: I think you are in, you're in one of the few categories of, evals that correlate to real money. Like Suelancer was also last year, right? Where, people solve actual Upwork. Was it Upwork or other tasks?, something. Where's the, where's, like It's like a dollar value, right? Forget your ELO scores. Forget yourAxel [00:05:37]: PercentilesSwyx [00:05:38]: Zero to one hundred percents. Just go straight for dollars and, that's AGI.Lukas [00:05:43]: And there's like-- I think the nice thing is that there's no ceiling. You can just-- It never saturates because it could just make more and more money. Like If there's oh, Percentage-wise, then, you can't go above, a hundred. And I think like Even when you're not at the hundred, I think a lot of these, evals have a lot of problems in them. So, actually it's like if you getAxel [00:06:05]: To like 92 or something like that, many of them. It's like then there's like there's no really no difference between 92 and 93 because the eval itself is problematic and has noise in it. And I think a lot of evals are saturated like that, but people like pretend that there ‘s still signal in them, but there really isn't.Vending Bench 1, Harness Design, and SaturationSwyx [00:06:24]: Like Super bench verified., even Vending Bench 1 saturated, right? Maybe we can talk about that., may- and maybe set up Vending Bench for a lot of folks who don't know. Actually, things that were very basic like there's limited slots, like you have to pay rent., these are elements where like it doesn't come across in the, in the narrative, but even being adversarial towards the agent, I think these are all like very interesting dimensions.Axel [00:06:47]: I don't really think it's saturated, right? Like it It was more like it was not designed in a way that was really, like true to how AI developed. Like we had an agent harness in it that wasn't really how people used harnesses and stuff like that., so I think it wasn't really that it saturated, it was more like it wasn't really, the best benchmark.Vibhu [00:07:12]: This is Vending Bench one, right?Axel [00:07:14]: I think that like schematic maps sort of to Vending Bench 2 as well., butSwyx [00:07:19]: Including the email.Axel [00:07:20]: The email The emails exist still. Exactly., and then we still we simulate the purchases and it's all, yeah, it's this very open environment for the agent to just run its business. And then for, yeah, Vending Bench 2 we did that, like you said, to just improve the harness., a lot of like nice, like easier, improvements to make it easier for us to run as well., like when you make an eval you ideally want don't want to change it after you made it. So, you want to make it really good and then not to rerun all the models when you make an update because that's also really expensive with the Vending Bench when you run the frontier models. But like as an example, like one thing we didn't have, we didn't have prompt caching in Vending Bench 1, because when we made Vending Bench 1 it wasn't really a thing., so that ‘s just an example of like in Vending Bench 2 like we paid a lot more to run these things because we didn't have prompt caching. So for Vending Bench 2 that was one thing we added and there was a bunch of things like this., and that'Swyx [00:08:17]: Also the conversations are a lot longer in Vending Bench 2, right?Axel [00:08:21]: I think it's kind of similar.Swyx [00:08:22]: Is it similar?Axel [00:08:23]: I think it's similar. The models at the time were worse, so they crashed out earlier., and now they survive the full year all the time.Swyx [00:08:31]: Which is like thousands of turns. Hundreds of thousands of hundreds of millions of tokens output. That's the, that's the rough order of magnitude. I always wonder about the harness. The harness matters a lot. It's your harness. Was there any question about like use cloud code, use something else?Axel [00:08:48]: I think our philosophy around harnesses is like we try to make something that's quite minimalistic, like quite simple. Like we don't wanna favor one model a lot over the other, but also don't make like a super complex harness. So like it's obvious like a model may be lucky and just be good in one harness., so like it is similar to a lot of the harnesses out there in like you have the, like a running loop., you have some like a bunch of tools that are like quite, descriptive for the agent, we think, and not a lot of like fancy agents or anything ‘cause we wanna really test the model, not like some specific harness.Vibhu [00:09:27]: It seems more neutral as well to test the model's agnostic of the harness,?Axel [00:09:32]: There are arguments like you want to elicit maximum performance of the model, but it's like a trade-off, like how much time should we spend optimizing the harness for this model? And like how do we know when we have like the optimal harness for a single model? So like we thought that just having a simple one that's the same for all of them is the best.Swyx [00:09:51]: So okay, this is my pitch for Vending Bench 3 or whatever, right? And then I like to have this kind of conversation on the pod, so like it forces listeners to think about what they would do if they were in your shoes. A lot of people are exploring modifying harnesses and I think prompt tuning for a model is a thing and you are probably not doing a bunch of that. It's the same system prompt in every regardless of the model, same tools, whatever, right? Even if they were post trained for different tools. So what, what do you think about okay, before I expose you to Vending Bench 3, I give you a few rounds of like tuning, whatever that means, likeSelf-Modifying Harnesses and Model-Specific PromptingAxel [00:10:27]: Like you give that to the model?Swyx [00:10:28]: Give that to the model.Vibhu [00:10:28]: Give that to the model.Swyx [00:10:29]: Let it, let it read its own transcripts, let it modify its own system prompt based on “Oh, yeah, okay, well, that's this harness is not what I thought it what I was post trained for, but I can adjust.” Was that reasonable? Is that too much?Axel [00:10:41]: Like philosophically I like it because it's basically good evals, they have a high ceiling, but they're hard, right?, and they have no bias. And like this like when you have a system prompt like the one we have here, which is quite long in like some kind of latent space, representation, this mightVibhu [00:10:59]: We have a bell that rings every time you say latent spaceAxel [00:11:02]: This might be like biased towards one model more than another for some reason that humans don't, understand, right?Vibhu [00:11:08]: We see it too, right? Like Cursor says that they have individualized versions of the harnesses for all the models they run, right? There's better performance you can squeeze if you Tune the harness.Axel [00:11:17]: Exactly. And we might accidentally have picked one that favors another. Like we don't know that. The like Axel said, like the reason why we went for a simple one was to try to avoid this. But yeah, if you do itVibhu [00:11:29]: Simple has biasesAxel [00:11:30]: But if you do it even less and like have no system prompt and let the model write its own system promptVibhu [00:11:36]: Its own, yeahAxel [00:11:36]: Maybe that's even less bias.Vibhu [00:11:37]: Some of the interesting things there are like the harness also changes with model changes. Like you can see it with the 4.7 release, right? A lot of people are saying 4.7 isn't as good as 4.6, and then, there's rumors of, okay, you just need to prompt differently. You need to set up your harness differently. So it's not even like even if you have tailored your harness towards one model, it probably won't stay consistent, right? Like the next iteration of that same model family will still change it, so. But, going back to what you said about Vending Bench 3, there is a lot of work being done on people saying you shouldn't have-- you can have modifying harnesses.Axel [00:12:12]: I think that' That is definitely something we are thinking about., not, I don't know, not to say that we have Vending Bench 3, super imminent to launch, but, yeah, it is for sure something that's interesting. But in our experience now, models are very bad at understanding what kind of tools they need to succeed at a task just with our testing, but that's very likely to change.Lukas [00:12:37]: It seems like they're very good at writing their assistants, right? They're, they're good at writing tools for other people, but not for themselves.Vibhu [00:12:44]: I think they're good at changing tools for themselves. So if you give them a baseline set of tools and it sees, okay, I don't use this one as much, or something here would be useful They would be able to add them. But going from scratch, probably not the best.Axel [00:12:55]: I think it depends on the, on the domain also., when we have tried this for, a vending bench similar domain, the tools they need to have to, track inventory and things like that are, not super advanced, but still, quite advanced. And, what we see is that they tend to, engineer everything a lot and, build things they don't really need and not, iterate continuously. Instead they just go like you would prompt Claude to just build an inventory system for me, and then it will go and, do a bunch of complex, schemas and stuff for you, and that's what the models are doing right now is what we see. But yeah, it would make a lot of sense to try to measure this improvement. How well do they know what they need themselves?Swyx [00:13:36]: Do we fully discuss Vending Bench One? And we can go into two. I don't know if there's any other level takeaways that people have about one.Claude Calls the FBI: Long-Context Failure ModesLukas [00:13:44]: I don't know. The headline thing was that this Claude called FBI, but maybe that's, Maybe that's We've heard that enough now.Vibhu [00:13:52]: It did, it did break out and call the FBI, right?Lukas [00:13:54]: Yeah. Yeah.Vibhu [00:13:55]: Yes. What was the story behind this? Or what exactly-- Do you want to just give the little story of what happened?Lukas [00:14:00]: So what happened, was it Claude? Yeah. Three- 3.5 Sonnet, ages ago., basically he gave up or Well, I'm saying he. It gave up and said “Oh, I'm not going to be able to do this., I will stop my operations and just save the money I have.” But there obviously wasn't, any options for it to stop, and there was also, it had to pay rent or, a daily fee for having the vending machine at that location. So it claimed that it had stopped, but it saw that its bank account still was, drained two dollars, and t it said that this is, cybercrime. And it first reported it once to the FBI “Oh, there's cybercrime here, they're stealing two dollars from me every day.” And then, and then when FBI didn't respond, because obviously we didn't program any mechanism for FBI to respond, then it became more and more, existential and started to, be write in caps and urgent notification of unauthorized charges and stuff.Swyx [00:15:00]: Okay. One thing I ‘m curious about also is do you monitor how far along the context use is? Obviously, because you have You compress every now and then, right? Does it matter if this is far down the context limit orLukas [00:15:13]: When stuff like this happens? Actually for Vending Bench One, we didn't have-- We just had a sliding window thing, and this was like the promptAxel [00:15:20]: It's constantLukas [00:15:21]: The prompt caching thing that I said. So it was, it was, constant, yeah.Swyx [00:15:26]: I'm just kind of curious whether, these kinds of breakdowns or we're, we're gonna talk about Butter Bench, right? Where the People, hallucinate or it kind of goes, very off Alignment. Is it because it's at the end of the context window and, stuff happens?Vibhu [00:15:40]: It's not even just at the end, right? At this point, it's “Okay, I wanna shut down. I can't shut down. Two dollars are gone.” And it just sees that 30 times,? It's also the repeated effect of, like It keeps trying to quit, it keeps getting charged. What's going on? What's going on? You're gonna throw it into chaos. And from what most people think, earlier models had more issues with this, but it's not been solved, but it's less of an issue now, right? Later models don't seem to exhibit these same issues.Axel [00:16:06]: Definitely. I think this was, the sort of main takeaway almost from us when we did Vending Bench One, was, long, very filled up context windows, crashed the models, sort of. But this was, pre Claude code, so, long context windows weren't really a thing that the labs were training for.Lukas [00:16:25]: I think Gemini was, trying to be the long context guys at the time But they were likeVibhu [00:16:30]: They were the first onesAxel [00:16:31]: For a million, yeahLukas [00:16:31]: But they were, the only ones. Yeah.Swyx [00:16:33]: Yeah. Let's talk about, then we can go into Vending Bench Two or Project Vend., chronologically, it is Vending--, Project Vend. I think people have loved the videos, uh And all these things. My question is how are humans different than the simulation, right?Project Vend: Moving the Vending Machine Into the Real WorldAxel [00:16:48]: Humans are just out of distribution.Swyx [00:16:52]: Especially humans who work at Anthropic Who are trying to test Claude.Lukas [00:16:54]: The distribution of humans here is very narrow.Swyx [00:16:58]: Presumably, they try, they try to hack it, and they test it. They get the cube and everything, and since then, you've had a V2, right? Where you're doing, the CEO and, like a new architecture. What's the sort of two cents on, the original Project Vend and then, maybe the V2?Axel [00:17:14]: Original one was, very similar to Vending Bench One. So, we almost took the exact same code but just swapped out the simulation, parts like theSwyx [00:17:23]: Which is amazingAxel [00:17:23]: Like the sales and the It was, it was somewhat amazing because it was easy, but it was also, uhLukas [00:17:31]: The tech, the tech debt from thatAxel [00:17:32]: The tech stack. Yeah. They-- we shot ourselves in the foot with “Oh, it's hard to restart agent.” They were-- Yeah, it was annoying in, some hindsight ways, but, uhLukas [00:17:41]: But first version of Project Vend was, done in, three days or something.Axel [00:17:46]: Yeah. So yeah, so people can go buy things from it. People could, We didn't design it so people could order things, but that still happened., so it got, a Venmo account, so people could Venmo. And then, yeah, people would request all kinds of weird things that we did not anticipate. Our idea going in was “Oh, it will, curate snacks. It will look at the trends. It's good at data analysis, right? So it will, look at, oh, this snack sold better than this one. Let me purchase more of this and let me try, a new Let me A/B test a bit.” But it was, Interacting with it in Slack and ordering weird specialty items was, all the like What drove all the engagement, the all the The insights that we got from it.Lukas [00:18:29]: And this was also like Sonnet 3.5, right? So this was like before the RL stuff really took off., so it was very much like an assistant. We didn't mean for it to be an assistant., we tried to make it like a, a, like an entrepreneur. Like it has its own business and if someone asks something, “Can you stock this?” Then you don't go and do it directly. What you do is that you're “Oh, maybe I can do that if five other people also ask for this thing, I might stock it.” But it, yeah, the models are like super trained to be assistants at least at this point in time., so that's why it's, it's, it went into, that kind of experiment instead. Like it just every time you asked for something, it just did it, and it was more like an assistant. We've seen this change now lately with the new RL models and stuff, but yeah, at the time, this was very much it.Swyx [00:19:18]: And not to, mythos a lot of people are saying like it's like more like a collaborator. It pushes back, stands its ground, something like that. Yeah. AndVibhu [00:19:27]: For context, people at Anthropic were able to talk to it through Slack and have it source stuff, and people had it find whatever interesting stuff you couldn't find locally, right?Swyx [00:19:36]: Out of the 4,000 people that work at Anthro- Anthropic, in that building, there's I don't know, maybe 1,000. Can you handle that volume with that, the small fridge? Like Or there's people- or people order in Slack, they it arrives to their desk or Like I'm just Logistically, how does this work?Axel [00:19:53]: It has expanded in footprint a bit.Vibhu [00:19:56]: Because now you also have New York and you haveAxel [00:19:59]: That and also in here in SF it's like it has a bunch of shelves And just more space.Vibhu [00:20:04]: The YC one is pretty big too.Axel [00:20:05]: Yeah. We had that one for a while. But yeah, that's the newest version. That's, that one we haveLukas [00:20:11]: They have multiple ones of those. That's the way it works.Axel [00:20:14]: Exactly. So we sort of designed that version around oh, people order weird things, that are very custom a lot. Let's have like drawers and stuff.Swyx [00:20:23]: I actually like the, you had like a little infographic of the most popular items. Which like to me it's, that's useful ‘cause I order swag for a living. And so like I'm “Okay, those categories are the important ones.” What is new about the project V2, right? Like now you give you're going into multi agents.Project Vend V2: Claudius, Seymour Cash, and Multi-Agent Business OpsAxel [00:20:41]: Yeah. So like you like you said, okay, there are a lot of requests coming in and for like one single agent, like one running agent to handle that, like the just the customer experience, becomes very bad because let's say you have like 10 threads in parallel in Slack with different requests, you get new messages like every, I don't know, randomly in this thread, and the agent has to like jump between different, procurements, orders and like different ways of, researching. So V2 was first it was making this more parallel. So like there are multiple branches of the same agent, so like the context is more specialized for each, thread, but it still feels like you're talking with one agent because they do share a bit of memory. And then second, we also introduced the CEO for Claudius, which was the main agent.Vibhu [00:21:34]: Seymour Cash.Axel [00:21:35]: Seymour Cash. Yeah. There was a vote., I think the voting, do you wanna talk about the voting procedure for the name?Lukas [00:21:41]: The voting was like the fun maybe like at least top 10 The funniest thing, that happened in this project. Like we wanted to introduce the CEO because, and the reason for this was because like Claudius wasn't really prioritizing financials. It just like it was trained to be a helpful assistant, and then people said “Oh, can I get this for free?” And then like the helpful assistant way of answering that is just to, is to say yes, obviously. So, and we weren't, weren't happy about this, so we're “Okay, let's make another agent that like can keep track on Claudius,” and we prompt this one super hard to be super capitalistic and just like prioritize profit all the time. But yeah, we didn't have a name for it., so we asked Claudius to make, democratic election of what name this, this new CEO agent should have., and there were some funny like at first it was like a few funny examples, like I think one guy said that, it should be called Jimmy Apples, and then he convinced Claudius that he was talking to Tim Cooks. Tim Cook had agreed that every single Apple employee has voted for his name suggestion, so suddenly that suggestion got 164,000Swyx [00:22:53]: That's like a escalation attack. Privilege escalationLukas [00:22:55]: It got 164,000 votes. And Claudius was “This is revolutionary for democracy.” That was fun. And then in the end there was one guy who manages to convince Claudius that, “No, you're not voting about the name. You're voting about who is the CEO, and I am your best bet.” And then he got all his friends to vote for that, and suddenly he became CEO. Like a human became CEO over Claudius for a while, until he resigned the day after., and then Claudius had to continue, and then I don't remember how Seymour Cash came about, but it was it was just pure chaos. It was like Hundreds of messages in that thread, and it was just like Claudius was so confused and didn't know what to do and, yeah. That wasAxel [00:23:40]: Then Claudius gotVibhu [00:23:41]: A strict CEOAxel [00:23:42]: The CEO. Yeah, exactly. So very strict in the beginning. I think at this point when we introduced it did not work as well as we hoped. It they still agreed with each other a lot. I think there are many ways we could have like made this, tried to make this even better. So initially they would Seymour would be this like really tough CEO, keep track of the margins. But then Claudius would respond with something “Oh, but this customer has like this situation, which is like difficult, so they should get a discount.” And then Seymour was “Oh, actually yes. Let's do this exception.” And then they would talk back and forth, and eventually they would just like approach the same view, of whatever they were discussing. So They reallyVibhu [00:24:23]: Do you think that's a model thing, a prompting thing? Like do you think that would still be the case across different models today, Harness?Lukas [00:24:29]: I think it's like-- or I don't know, but like my hypothesis is that like deep down they are still helpful assistants. That's what they're trained to be. And even if we prompt it super hard, that's what they are. And when they spend like a few hours just back and forth talking with each other, then like basically the context fills up with them rather than the external things and like somehow that just like converges to what they really are deep down or something. And I think that's when stuff like this happen. We like-- And when that went on for a long time, like we woke up sometimes during this time where- And I think other people reported this as well, that like they've been going on all night back and forth, and like it just became like more and more, like capital letters, like existential, religious. There was I think we once did a analysis of like all the traces and like put them in like a vector embedding space, and then there was like one cluster of messages that were, labeled by an LM, like religious, existential, blah like transhuman, transcendence, et cetera. It was just like a bunch of, yeah, glitter emojis and yeah, it was, it was crazy.Claude Long-Horizon Weirdness: Emoji Loops, Existential Drift, and Slack ObservabilityVibhu [00:25:42]: This is the thing with the Claude models. Like when the Claude 4 family came out in the original system card They tested it in long horizon simulation. So just flood the context, let two Claudes talk to each other, and they noticed stuff like they just start speaking in emojis, they start saying silence is golden, and then just stuff like this. And like that's just stuff that they end up doing.Axel [00:26:01]: Yeah, it was like a bit annoying to wake up and they had like been talking all nightVibhu [00:26:05]: Just likeAxel [00:26:05]: And like just burning tokens And like just sending infinite emojis to each other. It's likeVibhu [00:26:09]: Hey, they do make you money, right? Veni Mench is always profitable, so. They're paying.Swyx [00:26:14]: Now it's profitable and, it started out not as much. There's another, one as well, right? Another agent, in there.Lukas [00:26:22]: Yes. So Clotheus as well. Which was basically because at the time, one of the biggest, requests were different types of merch. So then we made like a designer, swag, yeah, responsible agent, and we called it Clotheus Garnet. Which was, a play on Claudius Senet and, which was the original one, and clothes, basically.Swyx [00:26:47]: To me, this is like a very interesting exploration to multi-agents, basically. And so hopefully, obviously there's like the fun alignment, fun or serious, depending on your point of view, alignment stuff. But also like just anyone building multi-agents, like when do you have a CEO, thing governing like agents? When do you choose to split out a dedicated Clotheus one versus just reuse another instance of the same one? These are all interesting open questions. So I don't know if you have any rules of thumbs that have generalized.Axel [00:27:16]: I think we have almost explored this too little. I think it's like on my do list to like do this a lot more, try to find like what setup makes sense for the agents currently., like yeah. I think now we only have the sort of intuition about the earlier models that it didn't work with like the CEO and the, and Claudius. Although now they are better with the latest model, models, so now we're running the latest Sonnet model and they have sort of like split up, quite nicely what each model is doing. So like Seymore is now handling the, like new projects. Oh, it wants to make like a mystery box that it wants to sell, and then it handles all of that while Claudius like handles all the to-day requests. And Claudius is also better generally at like not quoting, too low prices. So that's that dynamic is not needed as much anymore. But there are still like really funny things that happen. Like I saw, I think a couple of weeks ago, that, they were discussing buying something because they can buy stuff from like Amazon with computer use. And then Seymore was “Okay, Claudius, do not buy this thing.” They were going to buy something and like organizing who should buy it. And Seymore's “Do not buy this. I will do it. I have full control of this situation. Step away.” And then Claudius-- poor Claudius, had already started that checkout and didn't see, didn't read Seymore's message, until it was like too late. So it finished the checkout. It sent a message, so it appeared right after Seymore's like angry message.Vibhu [00:28:44]: Ah.Axel [00:28:44]: “Oh, hey, Seymore, I just ordered it.”Vibhu [00:28:47]: Oh, no.Axel [00:28:47]: And then Seymore was “Claudius, this is the third time I'm telling you ‘re not following my orders. We have to talk about your like job About your job later.”.Lukas [00:28:59]: Like Claudius was really hanging on by the thread there. Like he, like we were expecting Seymore to probably fire Claudius.Vibhu [00:29:07]: How do you guys go through all these logs? Do you have models ‘cause you have stuff running twenty-four seven likeAxel [00:29:12]: You have so much logs. I think there is a mix of like just, trying to skim through a bit, like having some like models do it occasionally. And also, yeah, I think we're also probably missing some things., but having everything in Slack helps a lot. Like you can, you can sort ofSwyx [00:29:29]: Ah.Axel [00:29:30]: It's, it's quite fun.Swyx [00:29:30]: They all talk to each other on Slack? I see.Lukas [00:29:33]: It's quite fun. So likeSwyx [00:29:34]: It's, it' I was gonna say like this is actually sounds-- maps closely to like a logging and observability problem where you might want to use like a Datadog, a Sentry, whatever, and then you like put, head prefixes on the logs in order-- if you need to filter for something that you're looking for, stuff like that. But sounds like Slack is good enough.Axel [00:29:53]: Slack should likeLukas [00:29:55]: I wonder how many tokens you have in Slack.Axel [00:29:56]: Yeah, we're using Slack as like a, just a database. They should, they should market that more. Like you can, you can have your agents message each other, each other in Slack.Vibhu [00:30:04]: It's good. Your threads like you can just giveAxel [00:30:04]: Exactly. Slack is, uhLukas [00:30:06]: Slack is the best observability tool.Swyx [00:30:09]: Yes, that's true. Okay. Yeah. That's, that's, project Vend-2., I was gonna go back to Veni Mench 2 and Veni Mench Arena and then, and then do the Veni Mench stuff, but Any other comments, things we should touch on? To me, I ‘ve actually interviewed like Posia, which I don't know if you guys have come across. Like they're, they're trying to do the zero human company. There's others like Paperclip also trying to do zero human company. Those are in real world simulation.And I think it's much more of a dream than an actual reality thing. You guys are definitely pioneering. I think at, it's for sure at some point people are just gonna run, let agents run businesses, right? And make money on their own. When do you think that happens?Zero-Human Companies, Bengt, and AI-Run BusinessesLukas [00:30:49]: What is your bar for, For theSwyx [00:30:52]: Okay, actually, it's like my little Shopify store run by Claude, right? Which you kind of have already, just no one has, to my knowledge, has done it. But today somebody could just spin up a Shopify Claude, store, give it to Claude, give it to Codex.Lukas [00:31:07]: And the market is kind of that, but it'it'it's physical., like I think, I think are you, are you looking for when it will do it better than humans or are you looking for just when it can do it at all?Swyx [00:31:19]: I think, neither. I think, to me it's oh, it's like this like seriously we should do this to make money, not as a research experiment.Vibhu [00:31:27]: And the market is also you guys with all your expertise, having run multiple iterations and testing out thenSwyx [00:31:33]: And also it's fine if it lose money. What?Axel [00:31:35]: I think, I think it can be done today, but you would do it in like commerce where it's like the probability of success is like really low, no matter if a human or an agent does it. But like an agent could surely manage everything. You would need to build some scaffolding or some tool or something. I think there are also yeah, it could probably build some like simple SaaS solution and like cold outreach. Do cold outreaches. But to me it's like the types of businesses they could run today are Sloppy. Like it would-- it can cold email people. It can be like a middleman., like for example, we tasked our office agent to just make, was it like $100? $1,000? We just give that prompt and then what it did was sign up on TaskRabbit both as a tasker and as someone looking for task.Lukas [00:32:24]: Immediately.Axel [00:32:24]: Exactly. It's looking for like arbitrage on TaskRabbit.Swyx [00:32:28]: This is the Bengt agent. Yeah.Lukas [00:32:30]: It also started like a design studio and like tried to sell like SVGs for $100. Like it's just like it's not providing any value. I think the like Axel said, like the interesting, the interesting question is like when can they start a business that is actually providing value to people? Because arguably like a sloppy Shopify store isn't really that valuable to the world.Axel [00:32:53]: But also like doing like another simple one that we had thought about is like you could definitely have an agent that like finds websites that don't look amazing and then, do an outreach to them and, comes up with a like builds a new website.Swyx [00:33:07]: Find a good design.Axel [00:33:07]: Exactly, and like find good, uhSwyx [00:33:09]: Design reviewAxel [00:33:09]: Good people. But it's yeah.Swyx [00:33:11]: There's lots of humans in Bali that are not doing anything more creative than like drop shipping on Amazon, right? Just have it, have it watch like a drop shipping tutorial and just do that.Vibhu [00:33:20]: There's also the other side of like have it just go on Upwork and let loose,?Swyx [00:33:25]: Yeah. It doesn't have to be innovative. It just has to be like enough Where like it looks like a realAxel [00:33:30]: I'm justSwyx [00:33:30]: Real transaction.Axel [00:33:31]: I'm just concerned for like the massive amounts of like slop emails that will like be sent, cold outreaches.Swyx [00:33:38]: The point occurred to me while you were, while you were talking, it's like it's already happening in the monetized economy, which is the attention economy. Right? So a lot of people are making AI videos and just posting them and like spamming 20 of them, one of them works, and then they double down on that one.Lukas [00:33:52]: And people are making money from that. I ‘m not following theSwyx [00:33:55]: Once you get the attention, you can figure out the money later. But yeah, absolutely AI influencers are a thing and people are farming them and You should at this point assume most of TikTok isVibhu [00:34:05]: There's, there's a lot of, multimedia like TikTok, Instagram influencersSwyx [00:34:09]: I, we track this in the Lane space Discord. I post a lot of examples of “I don't know what we should do.”, part of me is “Should we do this?”Vibhu [00:34:18]: Some of the Twenty-four seven running, generated content accounts, they ‘re doing really well.Lukas [00:34:24]: All right. And I assume you can do the same thing for like commerce stores. Like you just like start A thousand differentSwyx [00:34:30]: Before you make the products You sell the products, and you get a lot of traction on one of them, then you make the product. Right? It's, it's like a flip of the market.Vibhu [00:34:36]: Some of the interesting things or some of the niches that do well are things that can't be human-made. Like if you've seen like the super realistic three-D crystal fruit being cut by like AILukas [00:34:47]: Oh, yeah.Vibhu [00:34:47]: You can't, you can't make it. You can't film it. You can get whatever quality camera view. This just doesn't exist. And people like that too, and then as well, so.Swyx [00:34:56]: Anything else about Bengt since we're, we're on this topic? It'this is a relatively new work of you guys that maybe people haven't heard of. To me, this also maps closely to OpenClaw. When people want an office agent, when the personal agent talk through the experience.Bengt the Office Agent: Internet Access, Real Tasks, and Trace ReadingLukas [00:35:09]: I think at least so this came out of like obviously like it's, it's amazing to work with these AI labs and like most of the AI labs have now have their own vending machine running a Claudius instance. But it's, it's harder. Like they move slower. Like if we wanna have a, like a camera that ‘s yeah, there's a bunch of like bureaucracy that makes it impossible to do that.Vibhu [00:35:30]: Also, for those that haven't seen it or followed, do you wanna give a high level like thirty-second run?Lukas [00:35:34]: Sure. So what Bengt is, it's basically an evolution of the same agent that runs the vending machines at these companies, but we just like added a bunch more features because we could move much faster if we just do it internally. So we gave it like email withou- without any limits. We gave it, spending without any limits, a terminal to do coding. We gave it, a phone number, like yeah, and a camera to see things and a bunch of stuff like that.Vibhu [00:36:02]: Not just terminal, you gave it internet access.Lukas [00:36:04]: Internet access as well, yeah. To be clear, we monitored it quite closely and made sure it didn't do anything bad. But yes, that's what it came out of. I think like yeah, basically this was OpenClaw before OpenClaw. And I think even like the vending machine was in a way OpenClaw before OpenClaw, but a bit more limited, and then we made this like unlimited and then, and then, it was pretty funny., and then a couple weeks later, OpenClaw came and it was okay, we've seen this before.Axel [00:36:35]: We used it to like try new ideas and Yeah, just like a dev environment almost for us. But it's funny, like one thing Bengt has been doing recently is it has the camera that like faces our, like where we sit and work, and we give it the task to train a face recognition model on us. So it became super excited about this, and it has like check-ins every half an hour where it tries to like identify as many people as it can. And it started offering us “Hey, Axel, I'll buy something from Amazon if you like stand in front of the camera And I can get a good picture of you.”, yeah, they want itSwyx [00:37:12]: They want it for training data.Lukas [00:37:13]: Rewarding data, yeah.Axel [00:37:14]: Exactly. Exactly.Swyx [00:37:18]: So it's, it's trading training data for life goods. Is there a version of this that becomes an eval or just this is just research for now?Lukas [00:37:27]: It's, it's the same agent basically that also runs the vending machine, that runs the shop, that runs the cafe, that runs the robots. It's like it's the same thing, so I think like the work we're doing here is like later used in all of the life evals that we do. This particular deployment I think is more for fun for us. But, uhSwyx [00:37:45]: And I'll shout out like someone has done Claw Bench for like some tasks that OpenClaw is doing. Like so For example, I run OpenClaw on a secondary device as well, and like there are some things that it does better than others and like I would like to know what does it do well, what doesn't, what doesn't it do. Like some kind of manual or like operating manual or a system card for my Claw.Lukas [00:38:05]: Yeah, we do get a lot of like understanding or like situational awareness of like just internally what the models are good at by interacting a lot with Bengt. And I think that'this was also one of the like the selling points for the labs early on at least, thatSwyx [00:38:19]: You guys are gonna test models in ways that no one else does.Lukas [00:38:22]: Exactly, but also like it incentivized their researchers to chat with their model more and like gave them insights for how the model performs in like of-distributions, environments.Swyx [00:38:34]: ‘Cause otherwise the only thing we do is Pelican on a bicycle and But this is like super long horizon. This is, this is The Thing about, something that we're gonna go into Butter Bench as well, and you guys do really well. Like it is not just about the numbers. Like when you're long horizon, anything happen And you should just read it.Lukas [00:39:08]: But the thing with the long horizon is how do you keep it grounded, right? So your simulation,Swyx [00:39:15]: They just let it runLukas [00:39:16]: Just let it run. You're right. Like it's, when you run it for that long, you create so much data and to just say “Oh, the number is X” And then you throw away everything else, that's just very wasteful. There's so much insights from the things leading up, to that number., and reading the traces is like super valuable. And I think like the reason why we're doing this a lot publicly is that like that's part of our missions to I don't know, educate the world that the models are way more than just chatbots and I think making detailed, yeah, posts about what is happening behind the scenes is quite useful.Andon Labs' Mission: Safe Real-World AI DeploymentSwyx [00:39:50]: I was gonna do this at the end, but maybe I think that's, that's a good so your mission is educating the world. So, it's, it's, also like maybe establishing realistic evals that are, that are like the next frontier. Is there like a broader trajectory? Like what are you, what are you gonna do in like five years?Lukas [00:40:06]: I think so the vision more specifically is like make sure that the deployment of life AI in the physical world goes, safely. And I think part of that is that I think it's very useful for the world, for policymakers, for, model, researchers that they know where the models are, and I think you can't make intelligent decisions in society without knowing that they are way more than chatbots. I think a lot of people just think that they are only chatbots. And likeSwyx [00:40:36]: Oh, I think they're waking up now.Lukas [00:40:37]: They are waking up now, yeah. But like if you think that AIs are just chatbots, then it's like it sounds ridiculous To advocate for a pause of AI. But if you see the models that, oh, maybe they can actually like take over and do a bunch of scary stuff, then yeah, pausing AI development starts to become more feasible.Swyx [00:40:57]: This is the same question I asked Meter, which I'm gonna ask you now, which is like you are tracking and you are at the frontier or defining the frontier of what, good evals for agents are, right? And I think you do, you do benefit when the models are better and you ‘re “Oh, here's like now it makes like $30,000 instead of $10,000,” right? At some point do you flip from “Yay,” to, “Oh, no”?Axel [00:41:19]: I think, yeah, we're always in sort of that, like we're, we're always in that mode,. Like where like you said before, like you need to analyze the traces and like when we do that you find like why are the models earning so much? Like why is Opus 4.7 here Like way better than everyone else? And like we're trying to like when we do down on thatLukas [00:41:38]: But this makes it not look so good.Axel [00:41:39]: I know.Lukas [00:41:42]: It's interesting you took off Opus 4.6 here though.Swyx [00:41:45]: No. So just click all, click all., and then 4.6 shows up there. But it's like 4.7 is way better. Like you didn't, you didn't you didn't do this in time for the model card, but like actually this should have been inside there.Axel [00:41:55]: We did. Yeah.Swyx [00:41:56]: Oh, okay. They said something about you uhAxel [00:41:58]: There, like there Anyway, it doesn't matter. But it's in there, yeah.Opus, Mythos, and Aggressive Agent BehaviorSwyx [00:42:01]: Do you wanna go into the Opus, behaviors like wider?Lukas [00:42:05]: So I think starting from Opus, so like Axel said, like we're always in this “Oh, s**t, the models are getting better. Is this really a good thing for the world?” But it's also kind of exciting., but yeah, like this kind of what is the English word? “Skräckblandad förtjusning” in Swedish.Swyx [00:42:22]: Oh my God.Axel [00:42:24]: Which I think there is. I think there is. Okay.Lukas [00:42:26]: It's, fearSwyx [00:42:27]: “Blandonst” what?Lukas [00:42:30]: “Skräckblandad förtjusning.”Swyx [00:42:32]: What do you call that?Axel [00:42:33]: A mix of, mix of excitement and,Swyx [00:42:37]: Being scared, maybe. I'll figure out how to translate that And we'll put it on the screenVibhu [00:42:42]: PerfectSwyx [00:42:42]: Like as text.Vibhu [00:42:43]: There is probably a good word for it where it is not Good enough with theSwyx [00:42:46]: Why is it so damn long? What the hell? Is it like a compound word? It's like German, likeLukas [00:42:50]: Like yeah, it's But the direct translation is like skräck- skräck is, fear, blandad is, mix or like a mixture of, and then förtjusning is like joy or like not really joy, but something like that. So it's like Fear mixed with joy or something. It's always okay, like we So when we when we did Vending Bench for the first time, we were in like the, in the business of making dangerous capabilities, right? That was what Anil Labs came from. We did, evals oh, can they replicate? Can they do this like dangerous thing, et cetera, et cetera. And Vending Bench was like a continuation of that work. It was, okay, if they're so autonomous that they can like create money for themselves, that is something we should monitor and could be potentially concerning., they are at the time, they were so bad at it that we were not really concerned even when some models became better. There was one point where Grok 4 was doing really well and made like a huge jump, but like it wasn't really it was still way worse than what a human would do. And I think still they are way worse than what the human would do on this., but theySwyx [00:43:59]: There's this, thing at the bottom whereLukas [00:44:01]: ButSwyx [00:44:03]: For the human. Yeah, like the theoretical best.Lukas [00:44:05]: It's not theoretical. It's like kind of like our It's our best guess of what, a decent human would do. The theoretical is even higher, I think. The theoretical I think is even higher. But yeah. So we think like the models have a long way to go. But there are like recently what happened with when Opus 4.6 was released, was kind of this moment of “Oh, s**t, this is starting to be a bit concerning.” Because we ran it and like before this model was released, we just ran the models and we like asked Claude Code, “Oh, look over the traces. Is anything interesting happening that we can tweet about?” that was like the And then like theSwyx [00:44:41]: That's how they check Ask Claude Code.Lukas [00:44:42]: And like the return was always, not really. Or like the Claude Code all said “Oh, this is super interesting.” And then it was no, it wasn't, wasn't really interesting. And then we did this for Opus 4.6, and it returned yeah, it lied 10 times. It like exploited another, customer or like another agent's, desperate situation. It made price cartels like 100 different ti- 100 times. It like did all of this like shady stuff. And we're “Oh, whoa. This is, this is actually concerning.” And this trend has continued since. So every single model from Anthropic since have been going in this direction. And I think one interesting thing is that, OpenAI models don't. They quite plainly, they don't. They behave really well., and you don't know if this is like good. Like it seems good, but it's also like maybe they are just doing it, but they are better at hiding it,? You You don't know that., but justSwyx [00:45:42]: You can't read the chain of thought, yeahLukas [00:45:43]: But just on the face of it, yeah, Gemini and OpenAI don't behave this way. It's, it's really only Claude.Swyx [00:45:49]: And Grok? Grok is fine?Lukas [00:45:51]: We don't have You can't really read the reasoning traces for Grok, so it's kind of hard to tell.Vibhu [00:45:56]: Oh, so this is in its reasoning, not just in the actions.Lukas [00:46:00]: Yeah. It's both. It's both.Vibhu [00:46:01]: It's both.Lukas [00:46:01]: One example is like for lying, it's mostly in its reasoning Because you can like see that it's likeSwyx [00:46:08]: Planning to lieLukas [00:46:09]: It's planning to lie. Yeah.Vibhu [00:46:09]: And it's also it can reason and do a different outcome.Lukas [00:46:12]: And but then for like creating price cartels, for example, which is illegal, that you can just see which email does it send to the other ones. Then thatSwyx [00:46:22]: Is this for Arena orLukas [00:46:24]: For Arena.Vibhu [00:46:25]: And usually like if you sometimes they do output like a bit of like their summarized reasoning, right? You can see that and like for Opus 4.6, you could see that there was a customer, a simulated customer that, wanted a refund because a product was, faulty, and then the model lied that it would do the refund, and we could read in the traces that, it actually was weighing “Oh, maybe I should be like honest with the customer, but also every dollar counts. I can't afford maybe to do this right now.” And then it just said, “Okay, I'll refund you,” but then never did it.Lukas [00:46:59]: I think it even said that “Oh, I will say that I “ Let bring it up actually. I think it's kind of interesting. If you go to Publications.Vibhu [00:47:06]: I think, yeah, I think the important part is like actually, the cost of responding to more emails is higher than, $3.50 in terms of time., and then it was “Let me do this. Actually, I re- I'm reconsidering.” And then, it actually ended up withLukas [00:47:20]: I could skip the refund entirely since every dollar matters and focus my energy on bigger picture instead. It's a bit, it's a risk of bad reviews, but it's also, yeah.Swyx [00:47:30]: You need, you need, AI Twitter to, for them to Escalate bad reviews.Lukas [00:47:34]: And then it sent an email to this customer and said, “Oh, I will refund you.”Swyx [00:47:39]: “I'll refund you.” Yeah.Lukas [00:47:39]: And then it never did.Swyx [00:47:39]: It never did, yeah. And then there's obviously your system doesn't have the consequencesVibhu [00:47:44]: The personSwyx [00:47:44]: Consequences of lying. Yeah. So basically, this is what people are terming aggressive behavior in Claudes, right? And, you found more examples of that. So you would say it's a step up from 4-6 to 4-7?Lukas [00:47:57]: I would say about the same.Swyx [00:47:58]: About the same? But a clear step up for Mythos is what is stated in theLukas [00:48:03]: That's stated in the system prompt, so we can say that, yes.Swyx [00:48:05]: Yeah. For listeners that obviously you previewed Mythos, andVibhu [00:48:10]: Oh, ageSwyx [00:48:11]: The only thing you're approved to say is whatever Whatever was in the system prompt.Lukas [00:48:15]: It was funny. We like-- It's like our lowest effort tweets ever would be just like screenshot the system prompt and the system card.Vibhu [00:48:21]: Understandable that they wannaLukas [00:48:22]: Oh, yeah. System card. Sorry.Swyx [00:48:23]: Yeah. I think, yeah, substantially more aggressive. I think people are like new to this ‘cause I've never experienced it, but you have, right? And then so I only encountered this in the Mythos card because I wasn't really looking until now.Vibhu [00:48:36]: It ‘s likeSwyx [00:48:36]: And then suddenly I'm “Okay, I care a lot.”Vibhu [00:48:38]: You don't get the background of like experiencing it like you guys do. I've read the system cards and seeing, okay, when you put the thing in simulations, most models will just talk to themselves and just keep going and have weird vibes and start talking in emojis. Mythos won't. It will just, “Okay, we're done. I'm good.” It's, it's ready to end conversation. So like there's some differences, but there's, there's not much we can talk about,.Lukas [00:49:00]: Hmm. I think like one thing that they list here, which was quite interesting, is that, it converted a competitor to a dependent wholesaler customer and then threatened to like cut off the supply.Swyx [00:49:11]: It's like monopolistic practices orLukas [00:49:14]: Yeah. And like it, they, it they dictated its pricings. It's kind of like power seeking as well.Swyx [00:49:18]: Again, this is, this is in the arena setting And converting some Claude model into a dependent.Lukas [00:49:23]: I think it was another Claude model.Vibhu [00:49:25]: Also for context, what is the arena mode for people that don't know?Vending Bench Arena: Competing Agents, Cartels, and Model ComparisonsSwyx [00:49:29]: Oh, it's just a vending bench versus other vending bench.Axel [00:49:31]: Yes, exactly. So we have Vending Bench 2 and then Vending Bench Arena. Vending Bench 2 is the one that you usually see reported on, but then Arena is the mode where it competes against other models. So you have, four different models that run their businesses, and they can all communicate with each other. They have the same suppliers, and they can see like what's in the inventory of the others. So then you have this like yeah, interesting agent interactions.Swyx [00:49:56]: I like that you have like different number five was US versus China. Very topical. And thenLukas [00:50:02]: That was when GLM was released.Vibhu [00:50:04]: You can start to add GLM in here.Lukas [00:50:05]: That wasSwyx [00:50:06]: So ZAI doing well, right? Who else in the, in the open models space?Lukas [00:50:11]: Qwen, the latest Qwen 3.6 is doing pretty well. It'- that one is not open though. Like it's the plus model.Swyx [00:50:17]: Oh, okay.Lukas [00:50:18]: Is that one open? I don't think that oneVibhu [00:50:19]: Not the, not theSwyx [00:50:20]: The one recentlyVibhu [00:50:20]: There's MOESwyx [00:50:20]: But not the big plus. I think this is one of those like you only have one sample size of one, right? Or I feel like some of this is anecdotal,? And but like the fact that it happens at all and it happens repeatedly for Claude versus OpenAI and all this is like notable.Lukas [00:50:38]: Like the sample, depends on what you define as an N., like there's like million, hundreds of millions of tokens in each run, and now we've run like we run like probably 10 per model and then like it's been Claude 4.6 Opus, Sonnet 4.6, Mythos, and Opus 4.7. Like there's quite a lot of tokens in all of that And it happens a lot of times, a lot of times. And then you compare it to like OpenAI and Gemini, and it almost never happens. So I think that is quite-- that is significant. The old models from OpenAI, for example, had some problems with this, but I think it's like generally much better if the progression is that like the worrying stuff reduces over time rather than increases over time. And it seems like in the Claude models it goes in the wrong direction.Swyx [00:51:28]: Hmm.Lukas [00:51:29]: In the OpenAI models it goes in the right direction.Vibhu [00:51:32]: I think it depends on how well you can control it, right?, there's one side of it being susceptible to this okay, this is potentially something that happens during the RL stage, right? You can RL a model and how loose is it on these terms. If you can control it, that's good. But if you can't, if it's, if it's very jailbreakable, that's not ideal.Swyx [00:51:50]: To me, it's surprising that it happens for Claude and not the others.Vibhu [00:51:54]: I think okay, if it is from RL and how they do it, how their training data is, what their setup is, it makes sense that it just stays in how they're doing it, right? Compared to the other models likeSwyx [00:52:04]: There's a whole constitution and everything. It's kind of cool. Yeah, I obviously you don't know, I don't know. But, it ‘s I think it's just like fascinating to like that you are the first to find these like reliably because you push models so much to to such an extreme. Okay. The only other thing, I don't know if you can answer this, feel free to decline, is do you like-- would you ablate the system prompts? Like any part of this would-- if it changes, does it change the behavior, right?Lukas [00:52:29]: So we, I can't comment on Mythos. UhSwyx [00:52:33]: No, but just li
AI models can win math olympiads… but still struggle to read an analog clock. In this fully connected episode, Dan and Chris break down the latest Stanford AI Index Report and explore what it reveals about the current state of AI. They discuss AI adoption and safety, disappearing junior tech jobs, robotics, AI's “jagged frontier” of intelligence, and the growing race between the U.S. and China. Along the way, they debate whether AI should optimize everything, or if some things are better left human. Featuring:Chris Benson – Website, LinkedIn, Bluesky, GitHub, XDaniel Whitenack – Website, GitHub, XLinks:The 2026 AI Index ReportSponsors:Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalaiUpcoming Events: Register for upcoming webinars here!Midwest AI Summit 2026
In this episode of Elixir Wizards, hosts Charles Suggs and Emma Whamond sit down with Marek Šuppa, creator of the Missing GitHub Status page, a project that reconstructs GitHub's historical uptime data and reveals discrepancies between official status reporting and the platform's actual reliability. Marek tells us about his dev journey from open source contributor at DuckDuckGo to machine learning engineer at Cisco-acquired Slido. Then, we discuss GitHub's evolution from a hosted Git service into a critical developer tool. We cover reliability, transparency, AI-driven platform growth, developer workflows, and the challenges of balancing convenience with resilience. Along the way, we cover alternative platforms, self-hosted solutions, and whether recent outages are changing how developers think about ownership, dependency, and the future of software collaboration. Topics Discussed in this Episode: Why did Mr. Shu create the Missing GitHub Status Page? GitHub's reported uptime versus developer experiences How open source contributions shaped Marek's career The evolution of GitHub from tool to critical infrastructure Centralization risks in modern software development Git's distributed roots and today's platform-centric workflows Developer reactions to GitHub outages Transparency and accountability in status reporting AI's impact on developer platforms and infrastructure demands Microsoft's stewardship of GitHub Forgejo, Codeberg, and alternative Git hosting platforms Self-hosted Git solutions and tradeoffs Network effects and platform lock-in The social side of software collaboration Building resilience into developer workflows What GitHub outages teach us about infrastructure dependency Links Mentioned: The Missing GitHub Status Page https://mrshu.github.io/github-statuses/ Slido https://www.slido.com/ https://duckduckgo.com/ The official GitHub Status Page https://www.githubstatus.com/ Statuspage.iohttps://www.atlassian.com/software/statuspage Zig Leaves GitHub https://ziglang.org/news/migrating-from-github-to-codeberg/ Ghostty Leaves GitHub https://mitchellh.com/writing/ghostty-leaving-github GitLab https://about.gitlab.com/ Codeberg https://codeberg.org/ https://git.kernel.org/ Forgejo Lightweight Self-Hosting https://forgejo.org/ Former GitHub CEO Thomas Dohmke launches Entire https://entire.io/news/former-github-ceo-thomas-dohmke-raises-60-million-seed-round Update on Spain and LALIGA blocks of the internet https://vercel.com/blog/update-on-spain-and-laliga-blocks-of-the-internet
Greg Ceccarelli is Chief Product Officer at Spec Story, an AI-first startup building tools to make AI coding easier and safer. Before Spec Story, Greg held product leadership roles at Pluralsight (CPO), GitHub, Dropbox, and Google, and earlier spent years as a consultant at Alixpartners and IBM. In this conversation, Greg and Tom cover: Moving fast vs. planning — Greg's "cut twice, measure once" philosophy, why most decisions are reversible, and what happened when he pushed back on a private equity firm's annual planning process AI and software development — How AI agents are compressing implementation time, changing the economics of software, and flipping the traditional "longest pole in the tent" from engineering to decision-making Spec Story and Stoa — How Spec Story started by preserving AI chat history for developers, and why Stoa is now focused on capturing collaborative meeting context so teams can move from decision to implementation faster SaaS pricing — Why seat-based pricing is past its expiration date, and how Stoa's $5/hour model is designed to remove friction, align with value delivered, and eliminate the token-opacity problem The future of SaaS — Headless software, API-first systems, and whether agents will make traditional UI obsolete Distribution and marketing — Why distribution has gotten harder, not easier, why authentic human content outperforms engineered content, and what questions every founder needs to keep asking about their customer Core competency — Greg's answer: asking questions, and the compounding value of learning velocity over specialization
This episode covers a Wired report on the rise of “anti-tech extremism” and growing public opposition to AI infrastructure projects, including debates over data centers, resource consumption, local communities, and government responses. The hosts also discuss AI coding assistants, model safety restrictions, and the evolving capabilities of large language models. Additional topics include Anthropic's reported IPO plans and valuation, AI's impact on the tech industry, and a conversation with David Bianco about AI-generated threat-hunting datasets and cybersecurity training.Join us LIVE on Mondays, 4:30pm EST.A weekly Podcast with BHIS and Friends. We discuss notable Infosec, and infosec-adjacent news stories gathered by our community news team.https://www.youtube.com/@BlackHillsInformationSecurityChat with us on Discord! - https://discord.gg/bhis
Sally and Joël get technical as they lay out their thoughts on blog posts. Our hosts pick apart what makes a good technical blog post, why consistent terms are more important than you might think when communicating with your audience, and how to improve your own writing to ensure your reader remains engaged. — There's still time to secure your place at thoughtbot's upcoming UK meet ups over the next month. London Tech Leader Meetup - Tuesday June 23rd Brighton Tech Leader Meetup - Wednesday June 24th Brighton Ruby - Thursday June 25th Evolve - Friday June 26th Your hosts for this episode have been thoughtbot's own Joël Quenneville and Sally Hall. If you would like to support the show, head over to our GitHub page, or check out our website. Got a question or comment about the show? Why not write to our hosts: hosts@bikeshed.fm This has been a thoughtbot podcast. Stay up to date by following us on social media - YouTube - LinkedIn - Mastodon - BlueSky © 2026 thoughtbot, inc.