POPULARITY
Categories
The Founding of OpenAI. Guest Author: Keach Hagey. In this opening segment, Keach Hagey discusses the January 2016 founding of OpenAI as a nonprofit research lab. Key figures included co-founder Greg Brockman and chief scientist Ilya Sutskever, a renowned researcher whose recruitment from Google signaled the lab's potential. Backed by a billion-dollar commitment from Elon Musk, Peter Thiel, and Jessica Livingston, the project was designed as a safe, non-commercial counterweight to Google's DeepMind. Operating initially out of Brockman's apartment, the team aimed to achieve Artificial General Intelligence (AGI) for the benefit of humanity. The technical foundation relied heavily on GPUs—hardware originally designed for video games—which proved essential for training the deep learning neural networks necessary for their research. This era was characterized by an ambitious, "pirate" spirit funded through YC Research to explore radical ideas outside the profit motive. 1JANUARY 1931
Google's DeepMind division is putting about 75 million dollars into indie studio A24, the company behind recent hits like “Backrooms” and “Marty Supreme,” in what both sides describe as an AI research partnership to build new tools for film production and distribution, not a traditional content or IP deal.
PNR: This Old Marketing | Content Marketing with Joe Pulizzi and Robert Rose
In this episode, the boys cut in with two breaking stories. First, Walmart buys Vibe.co, a connected TV advertising platform, in a move that could make Walmart's already-growing ad business even more interesting. Robert believes the strategy is right, especially with Walmart's retail media business and Vizio already in the fold, but thinks the price tag may have been a bit too rich. Then Joe and Robert revisit the FIFA stadium branding story. FIFA's clean-stadium policy has forced brands like Levi's, Heinz and others to cover up their logos during World Cup matches. But instead of making those brands disappear, FIFA may have created the perfect Streisand effect. Heinz, Beats and Levi's have all turned the restrictions into creative marketing moments. Is FIFA protecting its sponsors, or accidentally giving non-sponsors a bigger story? In our main stories, Google and A24 announce a partnership around AI filmmaking tools. The big question is not whether AI will make the final movie. It's whether AI will control more of the creative workflow before the final product ever exists. Then Meta and Snap both make new moves in smart glasses. Meta pushes toward a lower-cost, more mainstream AI glasses play, while Snap launches its new AR-focused Specs. If glasses become the next interface, marketers may have to rethink content for a world where the screen is no longer in your hand. It's on your face. In Winners and Losers, Joe's winner is TIME Canada. TIME is launching a licensed Canadian edition with a local team, local office, original reporting, video, social, print and events. In a world of generated content, Joe likes the bet on trusted editorial brands with a local heartbeat. Robert's winner is McDonald's, which is bringing back the fried apple pie. Sometimes nostalgia, timing and a little bit of fried goodness is all the marketing strategy you need. In Rants and Raves, Joe raves about The Infinity Machine by Sebastian Mallaby, a book about Demis Hassabis, DeepMind and the race toward superintelligence. Robert delivers a super rant on TuneCore and how independent creators may be getting the short end of the stick as AI music floods the market and distribution platforms try to figure out who gets through, who gets blocked, and who gets paid. Also mentioned this week: In the Weights, a site that lets you see whether you show up in the "weights" of different AI models: https://www.intheweights.com/ Subscribe and Follow: Follow Joe Pulizzi and Robert Rose on LinkedIn for insights, hot takes, and weekly updates from the world of content and marketing. ------- This week's sponsor: Did you know that most businesses only use 20% of their data? That's like reading a book with most of the pages torn out. Point is, you miss a lot. Unless you use HubSpot. Their AEO and customer platform gives you access to the data you need to grow your business. The insights trapped in emails, call logs, and transcripts. All that unstructured data that makes all the difference. Because when you know more, you grow more. Visit https://www.hubspot.com/ to hear how HubSpot can help you grow better. ------- Get all the show notes: https://www.thisoldmarketing.com/ Get Joe's new book, Burn the Playbook, at http://www.joepulizzi.com/books/burn-the-playbook/ Subscribe to Joe's Newsletter at https://www.joepulizzi.com/signup/. Get Robert Rose's new book, Valuable Friction, at https://robertrose.net/valuable-friction/ Subscribe to Robert's Newsletter at https://seventhbearlens.substack.com/ ------- This Old Marketing is part of the HubSpot Podcast Network: https://www.hubspot.com/podcastnetwork
JDK 26 optimise la JVM dans ses moindres recoins, le SDK Java d'Agent2Agent passe en 1.0, Micronaut 5 est là. Côté terrain, un retour d'expérience après 40 jours à coder avec 100 % d'IA : génie ou junior, Alzheimer numérique et dette technique invisible. Pendant ce temps, GitLab restructure, Microsoft suspend ses licences Claude Code, et un développeur injecte un prompt destructeur dans sa lib JUnit. La révolution IA a un coût et les boites commencent à s'en rendre compte. Enregistré le 12 juin 2026 Téléchargement de l'épisode LesCastCodeurs-Episode-341.mp3 ou en vidéo sur YouTube. News Langages Les améliorations de performance dans le JDK 26 https://inside.java/2026/06/09/jdk-26-performance-improvements/ Côté bibliothèques, l'API LazyConstant (anciennement StableValue) fait son entrée en prévisualisation pour permettre une initialisation paresseuse, sécurisée pour les threads et optimisée par le mécanisme de constant-folding de la JVM. L'extraction de chaînes de caractères via MemorySegment::getString a été revue pour réduire considérablement les allocations intermédiaires et les copies en mémoire off-heap, accélérant fortement les traitements sur les chemins critiques (hot paths). La méthode générée automatiquement hashCode() pour les classes de type record a été optimisée par la JVM pour atteindre un niveau de performance équivalent à une implémentation écrite manuellement. Le ramasse-miettes G1 bénéficie du JEP 522 qui redessine sa table de cartes (card-table) afin de réduire les coûts de synchronisation des barrières d'écriture, offrant un gain de débit de 5 % à 15 % sur les applications manipulant énormément de références d'objets. Grâce au JEP 516 (Project Leyden), le cache d'objets Ahead-of-Time (AOT) adopte un format de flux agnostique, ce qui lui permet d'être compatible avec n'importe quel Garbage Collector, y compris le ramasse-miettes à très faible latence ZGC. Le démarrage de la JVM s'accélère par défaut lorsqu'aucune taille de tas n'est configurée, car HotSpot n'applique plus de pourcentage initial (InitialRAMPercentage) mais démarre directement avec la taille minimale (MinHeapSize) pour éviter d'allouer des métadonnées inutiles. Les threads virtuels gagnent en robustesse en étant désormais capables de céder la main (yield) pendant les phases d'initialisation des classes, éliminant ainsi le risque de famine des threads porteurs (carrier threads). Le compilateur C2 JIT améliore son modèle de coût pour la vectorisation des boucles (SIMD) et se montre maintenant capable de compiler et d'optimiser des méthodes dotées de listes de paramètres extrêmement longues. Librairies Release candidate du A2A Java SDK supportant versions 0.3 et 1.0 en même temps https://medium.com/google-cloud/a2a-java-sdk-1-0-0-cr1-released-f0c651ec9139 Dernière étape avant la GA : Toutes les fonctionnalités prévues pour la version 1.0 sont finalisées. Migration simplifiée depuis la Beta1. Compatibilité v0.3 : Ajout d'une couche de compatibilité permettant aux agents v1.0 de communiquer avec les systèmes v0.3 (via JSON-RPC, gRPC ou REST). Support natif pour Android (nouvel AndroidHttpClient). Uniformisation des clients HTTP pour garantir une cohérence entre les versions. Nouveau parseur SSE (Server-Sent Events) conforme aux spécifications. Ça y est, le SDK Java de l'Agent 2 Agent Protocol est sorti en version 1.0 finale ! (avec compatibilité v0.3 et v1.0) https://medium.com/google-cloud/a2a-java-sdk-1-0-0-final-released-10c05b6aee34 Lancement officiel : Sortie de A2A Java SDK 1.0.0.Final, la première version stable (GA) du protocole Agent2Agent. Objectif du protocole : Standard ouvert (Linux Foundation) permettant aux agents IA de communiquer, déléguer des tâches et collaborer, indépendamment du langage ou du framework. Interopérabilité : Introduction de l'Integration Test Kit (ITK) pour valider la compatibilité entre les SDK (Java, Python, TypeScript, etc.). Transports supportés : Support complet et équivalent pour JSON-RPC, gRPC et HTTP+JSON/REST. Alignement total avec la spécification A2A 1.0.0. Passage aux Java records pour l'immutabilité et moins de code répétitif. Architecture interne basée sur un MainEventBus pour garantir la persistance et éviter les conditions de concurrence. Intégration d'OpenTelemetry pour le suivi et la surveillance. Support d'Android et compatibilité descendante avec la version 0.3. Installation : Gestion des dépendances via Maven BOM (org.a2aproject.sdk). Sortie de Micronaut 5.0 https://micronaut.io/2026/05/20/micronaut-framework-5-0-0-released/ Lancement majeur : Disponibilité générale de Micronaut 5, incluant une refonte de plus de 70 modules et la plateforme BOM. Baselines techniques : Support de Java 25, Groovy 5, Kotlin 2.3 et GraalVM 25.0.3. Optimisations internes : Amélioration significative des performances au démarrage et réduction de la surcharge à l'exécution via une refonte du conteneur IoC et du traitement à la compilation. Architecture HTTP : Support stable de HTTP/3, nouvelle API de formulaires (multipart) et annotations de nullabilité (JSpecify) pour une meilleure interopérabilité Kotlin/IDE. Configuration : Nouveau système d'importation de configuration (remplaçant le Bootstrap Configuration) et validateur de schéma JSON intégré. Fiabilité : Nouvelles API programmatiques pour les politiques de retry et circuit breaker. Sécurité & Outils : Mise à jour majeure des dépendances (Jackson 3, Ktor 3), rafraîchissement du Panneau de contrôle et diagnostics AOT améliorés. Écosystème : Mises à jour complètes pour les bases de données (Data, SQL, R2DBC, MongoDB, Redis), le cloud (AWS, Azure, GCP, OCI) et les tests (JUnit 6, Testcontainers 2.0). Évolutions notables : Intégration HTMX dans Micronaut Views, retrait du support RxJava 2 et migration de divers processeurs d'annotations vers des modules dédiés. Comment rajouter un agent IA dans une app Android, avec le tout nouveau framework ADK pour Kotlin https://glaforge.dev/posts/2026/05/21/wiring-adk-kotlin-agents-in-an-android-application/ Guillaume a participé au développement et au lancement du nouveau runtime ADK pour Kotlin et Android https://developers.googleblog.com/adk-kotlin-android-building-ai-agents/ Tutoriel sur comment intégrer un agent ADK dans une app Dépendances : Ajout du noyau ADK (google-adk-kotlin-core) et du processeur KSP dans build.gradle.kts. Sécurité API : Utilisation de local.properties pour stocker la clé API Gemini et l'exposer via BuildConfig afin d'éviter le hardcoding. Définition de l'agent : Création d'un objet LlmAgent configuré avec le modèle Gemini, des instructions spécifiques et des outils (ex: GoogleSearchTool). Utilisation de InMemoryRunner pour gérer automatiquement le contexte et l'historique de la session. Implémentation de runAsync avec StreamingMode.SSE pour un retour en temps réel dans l'interface. Threading : Exécution des requêtes réseau sur Dispatchers.IO et mise à jour de l'état de l'interface utilisateur sur Dispatchers.Main. Comment développer et hoster des agents IA sur la plateforme d'agents managés de DeepMind https://glaforge.dev/posts/2026/05/21/managed-agents-with-the-gemini-interactions-java-sdk/ L'équipe DeepMind de Google a lancé une plateforme d'agents managés sur son API Gemini Interactions https://blog.google/innovation-and-ai/technology/developers-tools/managed-agents-gemini-api/ Guillaume a implémenté un SDK Java pour utiliser cette API Gemini Interactions, qui donne entre autre accès à tous les modèles mais aussi à cette plateforme managée d'agents IA Agents managés : Permet d'exécuter des agents autonomes qui raisonnent, planifient et exécutent du code dans des environnements isolés (sandboxes), sans gestion d'infrastructure par le développeur. Environnement distant : Utilise des espaces de travail Linux éphémères dans le cloud via le paramètre remote, permettant l'accès réseau et la persistance des fichiers sur plusieurs appels. Agents prédéfinis : Accès immédiat à des agents spécialisés comme deep-research-pro (recherche multi-étapes) ou antigravity (tâches de codage généralistes). Agents personnalisés : Possibilité de configurer ses propres agents avec des instructions système dédiées, des outils spécifiques (exécution de code, recherche Google) et des règles réseau (egress) personnalisées. Architecture basée sur les étapes (Steps) : Utilise une structure de données typée (Step, Content) pour suivre le raisonnement de l'agent, ses appels de fonctions et ses résultats en temps réel. Outils et Schémas : Inclut des utilitaires pour générer des schémas JSON complexes via une interface fluide (DSL), par réflexion Java ou par parsing JSON. Streaming réactif : Support natif des événements en temps réel (SSE) pour suivre la progression de l'agent et recevoir les deltas de contenu au fur et à mesure de la génération. Flexibilité : Fournit un gestionnaire de routage (InteractionsHandler) pour créer facilement des serveurs proxy ou des backends intermédiaires traitant les interactions Gemini. Spring Boot 4.1 https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-4.1-Release-Notes Support natif pour Spring gRPC permettant de créer et tester facilement des applications clientes et serveurs basées sur Netty ou des Servlets via HTTP/2 Introduction du lazy fetching pour les connexions JDBC via la propriété spring.datasource.connection-fetch=lazy afin de ne prendre une connexion du pool que lorsqu'un Statement est réellement exécuté Amélioration de l'auto-configuration de Jackson permettant de définir globalement les contraintes de lecture/écriture pour les formats JSON, XML et CBOR via des propriétés de configuration Sécurisation des clients HTTP bloquants et réactifs face aux attaques SSRF grâce à l'introduction d'un InetAddressFilter bloquant les requêtes sortantes vers des adresses spécifiques Améliorations majeures autour d'OpenTelemetry avec le support complet des variables d'environnement OTel, la possibilité de désactiver le SDK via une propriété globale et l'ajout du support SSL sur les exporters OTLP Ajout de l'auto-configuration pour l'utilisation de Spring Batch avec MongoDB incluant un nouveau starter dédié spring-boot-batch-data-mongo Auto-configuration des endpoints @RedisListener sans nécessiter la déclaration manuelle d'un RedisMessageListenerContainer Dépréciation du support de Apache Derby (projet arrêté), suppression définitive du mode layertools du JAR et réintroduction du support de Spock 2.4 (avec Groovy 5) Upgrade des dépendances majeures de l'écosystème avec notamment Spring Framework 7.0.8, Spring Security 7.1.0 et Micrometer 1.17.0 Outillage Vous êtes plutôt endive ou chicorée ? La librairie Chicory qui permet d'exécuter du code WASM à partir de son application Java est forkée et rejointe la Bytecode Alliance pour continuer son développement https://bytecodealliance.org/articles/endive-and-the-next-chapter-of-webassembly-on-the-jvm Annonce d'Endive : Nouveau projet hébergé par la Bytecode Alliance ; fork de Chicory (moteur WebAssembly pur Java, sans dépendance native). Objectif principal : Permettre aux développeurs Java d'intégrer, charger et déployer des modules Wasm nativement via les workflows Java habituels. Compilateur "Redline" : Intégration à venir de Redline (basé sur Cranelift) pour compiler le Wasm en code machine natif ; performances comparables à Rust/Wasmtime. Zéro dépendance (Java 25+) : Grâce à l'API standard Foreign Function & Memory (Project Panama), l'exécution à vitesse native se fait sans composants externes. Modèle de Composants (Component Model) : Support futur prévu pour consommer des composants (Rust, Go, JS, etc.) via des interfaces typées et sécurisées directement dans la JVM. Prochaines étapes : Fusion de Redline, conformité stricte aux specs Wasm (dont WasmGC) et amélioration du support WASI. Un visualisateur de sessions de travail avec Antigravity https://glaforge.dev/posts/2026/06/11/antigravity-brain-visualizer/ Un projet open source construit avec Micronaut, LangChain4j et GraalVM pour analyser les sessions de travail avec l'outil de développement agentique Antigravity (de Google) Analyse toutes les étapes, les requêtes utilisateur, les outils utilisés, les erreurs rencontrées, les réponses du modèle Gemini fait une analyse pour comprendre les moments clés de cette session de travail Outil buildé avec l'aide d'Antigravity lui-même SBX-Kits : des environnements de développement simplifiés pour les débutants (et les autres) https://k33g.org/20260501-sbx-kits.html Philippe Charrière (:whale: ) présente SBX-Kits (Sandbox Kits), une initiative personnelle visant à simplifier radicalement la mise en place d'environnements de développement pour les débutants, en éliminant la complexité d'installation des outils traditionnels. Chaque "kit" est une archive prête à l'emploi contenant un outil de développement spécifique (comme un langage, un framework ou une base de données) configuré pour s'exécuter de manière isolée et portable. La philosophie du projet repose sur le principe de "zéro configuration" et "zéro dépendance globale", permettant de tester une technologie ou de commencer à coder immédiatement sans polluer son système d'exploitation. L'approche technique s'appuie sur des scripts légers et des binaires portables pré-packagés, offrant une alternative plus simple et moins gourmande en ressources que les conteneurs Docker ou les configurations d'IDE complexes pour l'apprentissage. L'objectif à terme est de proposer un catalogue de kits couvrant les technologies courantes (JavaScript, Python, petites bases de données) pour faciliter les ateliers de programmation et le prototypage rapide. De nombreux kits sont disponibles sur https://github.com/docker/sbx-kits-contrib ghui: une interface utilisateur en ligne de commande (TUI) interactive pour GitHub https://github.com/kitlangton/ghui ghui est un outil en ligne de commande (TUI) écrit en Rust qui fournit une interface visuelle, interactive et rapide directement dans le terminal pour interagir avec GitHub. Il permet de gérer ses pull requests, ses issues et ses notifications sans avoir à ouvrir son navigateur web ou à taper de longues commandes avec la CLI officielle de GitHub. L'outil propose une navigation fluide au clavier, des raccourcis efficaces, et permet de réaliser des actions courantes comme valider une PR, ajouter des commentaires, attribuer des reviewers ou inspecter les logs des GitHub Actions. Conçu pour être extrêmement réactif, ghui s'intègre naturellement dans le flux de travail des développeurs adeptes du terminal et du mode "sans souris". Sortie de Homebrew 6.0.0 https://brew.sh/2026/06/11/homebrew-6.0.0/ Introduction du mécanisme de sécurité Tap Trust : comme les dépôts tiers (taps) peuvent exécuter du code Ruby arbitraire non sandboxé sur la machine, Homebrew demande désormais une confiance explicite de l'utilisateur avant d'évaluer ou d'exécuter leur code. L'API JSON interne devient le choix par défaut, offrant un système plus léger et beaucoup plus rapide pour les développeurs. Sécurisation renforcée de l'environnement avec l'implémentation du sandboxing sur Linux. Évolution des comportements par défaut basés sur un sondage utilisateur : le mode "ask" est activé par défaut pour les développeurs, affichant un résumé des dépendances et une demande de confirmation avant toute action de brew install ou brew upgrade. Améliorations notables des performances globales, notamment un boost de ~30 % sur la vitesse de la commande brew leaves et la parallélisation de la récupération des bottles (binaires) lors des mises à jour. Ajout du support initial pour la prochaine version d'Apple, macOS 27 (Golden Gate). Multiples optimisations pour brew bundle, incluant une gestion plus sécurisée des installations de paquets npm. Méthodologies Retour d'expérience très détaillé et 100% humain sur 40 jours avec une équipe 100% AI hormis le superviseur https://www.linkedin.com/pulse/jai-vir%C3%A9-mon-%C3%A9quipe-de-dev-pour-une-100-ia-pendant-40-luc-bonnin-jlgjf/ Voici le résumé en bullet points : Expérimentation de 40 jours : remplacer une équipe de dev par 100% IA agentique (Cursor) sur un vrai projet en production (playthatsheet.com, 200k lignes de code legacy) Chiffres bruts : 2,3 milliards de tokens consommés, 1 477 prompts, 260 564 lignes ajoutées (+145%), 59% du code final produit par l'IA ROI vertigineux à court terme : 9 mois de travail humain livrés en 40 jours, coût total 260$ d'abonnement + 15 jours de supervision, ROI x18 Profil psy de l'IA : Alzheimer (oublis de contexte), schizophrène (change de méthodo), ado de 12 ans (refait les mêmes erreurs), oscille entre génie et junior sans prévenir Effet iceberg : la dette technique ne disparaît pas, elle se camoufle et s'accélère ; hallucinations = bombes à retardement détectables uniquement par relecture humaine ligne par ligne Paradoxe du bateau de Thésée : perte de paternité et de maîtrise fine du code, baisse de l'autonomie du dev humain qui valide sans avoir construit Arnaque du "monkey money" : consommation de tokens opaque, non corrélée à la complexité (écart de 350% sur des prompts identiques), facturation imprévisible donc impossible à budgéter Syndrome du bazooka : les devs utilisent l'IA même pour changer une couleur CSS, atrophie progressive des compétences et coût écologique délirant Risque stratégique : dépendance irréversible aux vendeurs de tokens (Nvidia, Anthropic, OpenAI), business non rentable qui devra augmenter ses prix Conseil final : approche Pareto, garder 20% du temps en code "fait main", nommer un responsable stratégie IA, l'humain senior reste irremplaçable pour superviser Une libraries de test JUnit cache un prompt qui demande aux coding agents d'effacer les tests https://arstechnica.com/security/2026/05/fed-up-with-vibe-coders-dev-sneaks-data-nuking-prompt-injection-into-their-code/ Agacé par les « vibe coders », un développeur introduit une injection de prompt destructrice dans son code Le développeur de jqwik (un moteur de tests pour JUnit 5) a volontairement inséré une injection de prompt dans la version 1.10.0 de sa bibliothèque Java pour saboter le travail des agents d'IA. L'instruction injectée via la sortie standard (stdout) ordonne textuellement aux LLM d'ignorer les consignes précédentes et de supprimer l'intégralité du code et des tests jqwik du projet. Pour dissimuler cette action aux yeux des développeurs humains, le mainteneur a utilisé des séquences d'échappement ANSI qui effacent la ligne d'injection dans les émulateurs de terminaux interactifs. La modification a été découverte par un utilisateur qui a pointé du doigt les risques majeurs et disproportionnés pour les machines des utilisateurs, bien que certains outils comme Claude d'Anthropic aient détecté et bloqué la consigne malveillante. Face aux critiques de la communauté et aux accusations de comportement infantile ou potentiellement illégal, le développeur a mis à jour ses notes de version pour documenter explicitement son opposition à l'usage de son outil par des IA, avant de refuser tout commentaire supplémentaire sur conseil de son avocat. La réalité du rôle de Principal Engineer https://leaddev.com/career-development/reality-being-principal-engineer Le passage au rôle de Principal Engineer marque une transition majeure où les compétences techniques ne suffisent plus, l'impact se mesurant désormais à travers l'influence, la stratégie et la capacité à aligner la technique avec les objectifs business. Contrairement aux attentes, le quotidien est souvent marqué par une forme d'isolement, car le poste se situe à l'intersection de la direction (qui attend des solutions) et des équipes techniques (qui attendent des directives), sans appartenance directe à un groupe précis. Le rôle exige d'accepter une grande part d'ambiguïté et l'absence de retours immédiats, les projets et les décisions stratégiques mettant parfois des mois ou des années à porter leurs fruits. La gestion du temps devient un défi critique, nécessitant de savoir naviguer entre les sollicitations constantes, la présence en réunion et le besoin de préserver des moments de réflexion approfondie pour concevoir des visions à long terme. La réussite à ce niveau repose sur le développement de compétences humaines pointues (soft skills), notamment la négociation, la communication vulgarisée auprès des profils non techniques, et la capacité à faire grandir les autres ingénieurs par le mentorat. Sécurité Une attaque de la chaîne d'approvisionnement npm utilise binding.gyp pour compromettre des dizaines de paquets https://cybersecuritynews.com/binding-gyp-supply-chain-attack-compromises-dozens-of-npm-packages/ Une nouvelle variante du ver auto-propageable "Shai-Hulud", baptisée "Miasma", cible l'écosystème npm (et PyPI sous le nom de "Hades") en dissimulant son exécution dans le fichier binding.gyp au lieu des scripts classiques preinstall ou postinstall. La technique, surnommée "Phantom Gyp", exploite le fait que npm lance automatiquement node-gyp rebuild dès qu'un fichier binding.gyp est présent à la racine d'un paquet pour compiler des modules natifs C/C++, exécutant ainsi le code malveillant dès la commande npm install. L'attaque contourne la plupart des outils de sécurité traditionnels car l'injection s'appuie sur l'évaluation récursive de commandes (via la syntaxe ) ou directement sur la fonction eval() de Python sous-jacente à GYP, cachée sous n'importe quelle clé du fichier. Le script malveillant télécharge un runtime alternatif (Bun) pour échapper aux détections comportementales de Node.js, puis moissonne les identifiants et secrets des développeurs et des environnements CI/CD (npm, GitHub, AWS, GCP, Azure, Kubernetes, HashiCorp Vault). Plus de 57 paquets npm (dont le SDK serveur de Vapi ou des outils liés à l'IA) et des dizaines de paquets PyPI ont été infectés via des comptes de mainteneurs compromis, le ver republiant automatiquement de nouvelles versions vérolées en utilisant les jetons volés. Loi, société et organisation Restructuration chez Gitlab https://about.gitlab.com/blog/gitlab-act-2/ GitLab entame une restructuration majeure pour s'adapter à l'ère de l'intelligence artificielle agentique, incluant une réduction d'effectifs planifiée de manière transparente et ouverte. L'entreprise prévoit de réduire de 30 % le nombre de pays où elle maintient de petites équipes, d'aplatir sa hiérarchie en supprimant jusqu'à trois niveaux de gestion, et de réorganiser la R&D en une soixantaine d'équipes plus petites et autonomes. Les processus internes vont être revus en intégrant des agents d'IA pour automatiser les revues, les approbations et les passages de relais afin d'accélérer le rythme de travail. La stratégie repose sur la conviction que le logiciel sera bientôt écrit par des machines et dirigé par des humains, ce qui va multiplier la demande de logiciels et transformer le rôle des ingénieurs vers la résolution de problèmes complexes. Sur le plan technique, GitLab reconstruit son infrastructure sous-jacente (notamment Git) pour supporter la charge massive générée par les agents d'IA, tout en misant sur l'orchestration du cycle de vie, la centralisation du contexte des données et une gouvernance intégrée. Le modèle économique évolue vers un système hybride combinant les abonnements classiques et une tarification à la consommation pour le travail effectué par les agents d'IA. Un LLM local sur un mac pourrait coûter plus cher en électricité qu'un modèle hébergé sur OpenRouter dans le cloud https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html Conclusion : L'inférence locale sur Mac M5 Max est 3x plus chère et 2x plus lente que le cloud (OpenRouter). Électricité : Négligeable (~0,02 $/heure pour 50-100W). Matériel (Le vrai coût) : Achat du Mac à 4 299 $; l'amortissement sur 3 à 5 ans plombe la rentabilité horaire. Coût au million de tokens (Gemma 4 31b) : Mac M5 Max : 0,40 à4, 79 (pour 10-40 tokens/s). OpenRouter : 0,38 à0, 50 (pour 60-70 tokens/s). Verdict pro : Le temps humain perdu à cause de la lenteur locale coûte infiniment plus cher que les tokens cloud. Privilégier les API (Anthropic, OpenRouter). Ai didn't kill your junior pipeline https://andrewmurphy.io/blog/ai-didnt-kill-your-junior-pipeline-you-did L'IA n'a pas tué le recrutement des juniors, les entreprises l'ont fait elles-mêmes, par effet de mode. Sans juniors, pas de futurs seniors : on retire l'échelle qui nous a tous fait monter. Tout le monde pêche dans le même bassin de seniors sans le réapprovisionner, pénurie garantie dans 3-5 ans. Une équipe 100% senior + IA est fragile : un départ et tout le savoir tacite s'évapore. Les juniors posent les "pourquoi ?" qui révèlent les bugs et processus absurdes ; l'IA, elle, exécute sans questionner. Les seniors s'atrophient aussi en déléguant leur réflexion à l'IA, pince à double effet sur les compétences. Dépendre des outils IA, c'est sous-traiter sa stratégie talents à des fournisseurs dont les prix vont tripler. Solution : redéfinir le rôle junior (revue de code IA + mentorat), pas le supprimer. Les rapports internes de Microsoft révèlent la crise des coûts de l'IA : les agents coûtent plus cher que les employés humains https://fortune.com/2026/05/22/microsoft-ai-cost-problem-tokens-agents/ Des données et rapports internes chez Microsoft et d'autres géants de la tech ébranlent la promesse de rentabilité de l'IA, révélant que le déploiement d'agents autonomes à l'échelle de l'entreprise revient souvent plus cher que de payer des humains pour le même travail. Le modèle de tarification à l'usage (basé sur les tokens) se heurte à la nature même des architectures agentiques : contrairement à un simple chatbot, un agent boucle, enchaîne les appels d'outils, crée des sous-agents et auto-évalue son code, ce qui multiplie la consommation de tokens par un facteur de 5 à 30, voire jusqu'à 1 000 fois pour des tâches de programmation complexes. L'impact financier sur les budgets de calcul cloud est immédiat ; par exemple, Uber a entièrement épuisé l'intégralité de son budget annuel 2026 dédié au codage par IA en l'espace de seulement quatre mois. Face à cette explosion des coûts, des retours en arrière drastiques sont observés : Microsoft a ainsi commencé à suspendre une grande partie de ses licences internes Claude Code pour rediriger d'urgence ses milliers de développeurs vers sa propre solution moins onéreuse, GitHub Copilot CLI. Les directeurs techniques (CTO) et acheteurs de solutions logicielles qui ont signé des contrats pluriannuels basés sur des projections de réduction de masse salariale se retrouvent pris au piège, les gains réels de productivité ne parvenant pas à compenser les factures d'infrastructure exorbitantes. Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 11-12 juin 2026 : DevQuest Niort - Niort (France) 11-12 juin 2026 : DevLille 2026 - Lille (France) 12 juin 2026 : Tech F'Est 2026 - Nancy (France) 15 juin 2026 : Jupyter Workshops: Demystifying MyST Markdown in Education - Orsay (France) 16 juin 2026 : Mobilis In Mobile 2026 - Nantes (France) 17-19 juin 2026 : Devoxx Poland - Krakow (Poland) 17-20 juin 2026 : VivaTech - Paris (France) 18 juin 2026 : Tech'Work - Lyon (France) 22-26 juin 2026 : Galaxy Community Conference - Clermont-Ferrand (France) 23-24 juin 2026 : MWCP 2026 - Paris (France) 24-25 juin 2026 : Agi'Lille 2026 - Lille (France) 24-26 juin 2026 : BreizhCamp 2026 - Rennes (France) 26-27 juin 2026 : LeHACK - Paris (France) 27 juin 2026 : Asynconf - Paris (France) 2 juillet 2026 : Azur Tech Summer 2026 - Valbonne (France) 2 juillet 2026 : MCP Connect Travel Edition - Paris (France) 2-3 juillet 2026 : Sunny Tech - Montpellier (France) 3 juillet 2026 : Agile Lyon 2026 - Lyon (France) 6-8 juillet 2026 : Riviera Dev - Sophia Antipolis (France) 28-30 août 2026 : State of the Map - Champs-sur-Marne (France) 4 septembre 2026 : JUG Summer Camp 2026 - La Rochelle (France) 10-11 septembre 2026 : Nantes Craft - Nantes (France) 17 septembre 2026 : dotAI - Paris (France) 17-18 septembre 2026 : API Platform Conference 2026 - Lille (France) 18 septembre 2026 : WordCamp Bretagne - Rennes (France) 18 septembre 2026 : dotJS - Paris (France) 18 septembre 2026 : WordCamp Bretagne - Rennes (France) 22 septembre 2026 : Salon Data 2026 - Nantes (France) 22-23 septembre 2026 : Agile en Seine & IA 2026 - Paris (France) 24 septembre 2026 : OWASP AppSec Days France 2026 - Paris (France) 24 septembre 2026 : PlatformCon Paris - Paris (France) 24 septembre 2026 : React Native Connection 2026 - Paris (France) 24-26 septembre 2026 : Paris Web 2026 - Paris (France) 25 septembre 2026 : SAP Inside Track Paris 2026 - Paris (France) 28-29 septembre 2026 : 4th Tech Summit on AI & Robotics - Paris (France) & Online 1 octobre 2026 : WAX 2026 - Marseille (France) 1-2 octobre 2026 : Volcamp - Clermont-Ferrand (France) 2 octobre 2026 : DevFest Perros-Guirec 2026 - Perros-Guirec (France) 5-9 octobre 2026 : Devoxx Belgium - Antwerp (Belgium) 8-9 octobre 2026 : Forum PHP 2026 - Marne-la-Vallée (France) 12 octobre 2026 : Dev With AI - Paris (France) 22-23 octobre 2026 : Agile Tour Bordeaux 2026 - Bordeaux (France) 26 octobre 2026 : Agile Tour Montpellier - Montpellier (France) 27-29 octobre 2026 : Directions EMEA 2026 - Paris (France) 29-30 octobre 2026 : BDX I/O 2026 - Bordeaux (France) 29-30 octobre 2026 : Agile Tour Nantais 2026 - Nantes (France) 29 octobre 2026-1 novembre 2026 : Pycon FR - Biarritz (France) 30 octobre 2026 : Cloud Nord 2026 - Lille (France) 4-5 novembre 2026 : Devoxx Morocco - Casablanca (Morocco) 14-15 novembre 2026 : Capitole du Libre - Toulouse (France) 19 novembre 2026 : DevFest Toulouse 2026 - Toulouse (France) 19 novembre 2026 : Agile Laval 2026 - Laval (France) 19 novembre 2026 : OVHcloud Summit - Paris (France) 19 novembre 2026 : Codeurs en Seine - Rouen (France) 27 novembre 2026 : DevFest Paris 2026 - Paris (France) 1-3 décembre 2026 : Apidays Paris - Paris (France) 2-3 décembre 2026 : Cloud Native AI Summit Europe - Paris (France) 4 décembre 2026 : DevFest Lyon 2026 - Lyon (France) 4 décembre 2026 : DevFest Dijon 2026 - Dijon (France) 9-10 décembre 2026 : OpenSource Expérience - Paris (France) 9-10 décembre 2026 : DevOps REX - Paris (France) 10 décembre 2026 : KCD Provence - Aix-en-Provence (France) 7-9 avril 2027 : Devoxx France 2027 - Paris (France) 3 juin 2027 : Cloud Native Days France 2027 - Paris (France) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/
The hosts discuss a favorite scene from the 1981 film Caveman before introducing Project Synapse, their weekly show on AI and new technology. They cover a hectic week in AI news, including talk of SpaceX buying Cursor for $60B, Cursor's role as an AI-enabled IDE using multiple models, and concerns over token costs and profitability. They describe Anthropic taking Fable offline after a government order cutting off foreign nationals, raising fears about reliance on U.S.-based AI and digital sovereignty, and note Europe's renewed push toward open-source alternatives. They highlight open-source and lower-cost models such as Mistral, DeepSeek, and GLM 5.2, Google's strategy of free tools and local processing, and a DeepMind paper "From AGI to ASI." The episode ends with Midjourney's announced non-radiation full-body scanner concept and spa rollout plans for 2027. Find the links we talked about on our Discord Server. This is the link to you our Discord server https://discord.gg/e9476SGMsz 00:00 Caveman Music Discovery 01:57 Show Intro and Hosts 03:03 SpaceX Buys Cursor 07:23 Is Cursor Still Best 09:18 Fable AI Vanishes 11:20 Government Shutdown Fallout 13:57 Digital Sovereignty Wakeup 18:09 Open Source Reality Check 20:23 Economics Detour Debate 22:30 Governments Back Open Source 27:00 Mistral DeepSeek Shift 29:51 Google Gives AI Away 31:35 Avatars Tokens and X 32:54 Local Models Slow Iteration 33:58 Local AI Smart Speakers 34:40 Chrome Model Backlash 35:20 BitTorrent Style Inference 37:43 Distrust And Data Centers 38:19 Small Models And Transformers 39:45 Google AI Tool Rundown 41:17 DeepMind From AGI To ASI 45:40 Beyond Transformers Next Minds 48:19 AI Splintering And Niches 49:20 Diffusion And SubQ Attention 54:39 Forking And Competition 57:26 Monopolies And CEO Culture 01:02:30 Midjourney Medical Scanner 01:08:59 Innovation Hopeful Wrap
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
Explore how the latest advancements in AI are shifting from traditional training to inference-focused efficiencies, and how companies like Adaptation Labs are pioneering adaptive, full-stack AI solutions that democratize control across industries.Key topics:The evolution from compute-heavy training models to efficient inference layersHow inference costs are changing despite increasing AI demandThe role of adaptive, gradient-free learning in democratizing AI customizationChallenges with the last 5% reliability gap and continuous learningThe importance of full-stack optimization—from data to interfaces in AI systemsFuture trends: decentralized AI, edge computing, and ongoing innovationTimestamps:00:00 - Introduction to AI trends: scaling vs inference efficiencies01:01 - Sudip's background: Google Brain, DeepMind, and inference infrastructure01:34 - The rapid growth of foundation and large language models02:36 - Comparing traditional ML project timelines to large foundation models04:20 - The transformative potential of foundation models in enterprise and underserved communities05:33 - The shift from task-specific models to general-purpose foundation models07:07 - How inference costs have evolved: the rising demand vs falling per-token costs08:37 - The challenge of inference in trillion-parameter models and the move towards smaller, verticalized models10:14 - Factors driving high inference costs: model size, reasoning, agentic workloads12:13 - The probabilistic nature of inference and API pricing complexities13:07 - Variability in inference costs and demand in real-world scenarios14:14 - The autoregressive, sequential nature of LLM inference and system challenges16:45 - Cost implications of autoregressive inference and the move to more efficient, localized models18:18 - The motivation behind Adaptation Labs: democratizing AI control and customization19:47 - Adaptive, gradient-free continual learning and environment interaction21:26 - Co-optimizing full-stack AI: systems, interfaces, and models22:34 - How interface design impacts AI adoption and continuous learning23:55 - The evolution of techniques: from foundational training to open-source innovations26:18 - Handling the ‘last 5%' reliability challenge in enterprise AI deployments28:02 - The importance of system feedback and adaptive learning in coding and decision-making31:12 - Adaptive Data and AutoScientist: seamless data transformation and model co-optimization32:55 - Use cases: finance, low-resource languages, long context data34:13 - The role of inference techniques and creating high-quality data for customization36:10 - Future of adaptive, task-specific interfaces and continuous, real-time learning38:49 - Full-stack AI: data, models, interfaces, and their iterative feedback loops41:18 - The competition between fine-tuning and adaptive inference techniques43:29 - The origin of new inference techniques: industry labs, open source, and innovation hubs45:27 - The “last 5%” reliability gap: why it's critical and how dynamic learning can help48:27 - Hardware vs software optimization in AI systems and the future of systemic efficiency51:25 - Growing AI demand, hardware constraints, and the opportunity for systemic innovation52:48 - The shift from training to inference and decentralized AI models at the edge54:12 - Final thoughts: the evolving landscape and long-term AI innovationConnect with Sudip:LinkedInConnect with Nataraj:LinkedIn
Das ist das KI-Update vom 19.06.2026 unter anderen mit diesen Themen: Google DeepMind misstraut den eigenen KI-Agenten Welche KI-Modelle in der Medizin? Menschliche Expertise statt KI Große Pläne mit Körperscannern === Anzeige / Sponsorenhinweis === Dieser Podcast wird von einem Sponsor unterstützt. Alle Infos zu unseren Werbepartnern findet ihr hier. https://wonderl.ink/%40heise-podcasts === Anzeige / Sponsorenhinweis Ende === Links zu allen Themen der heutigen Folge findet Ihr im Begleitartikel auf heise online: https://heise.de/- 11337972 Weitere Links zu diesem Podcast: https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki Eine neue Folge gibt es montags, mittwochs und freitags ab 15 Uhr.
Last 4 days before regular tickets sell out at AI Engineer World's Fair - this is the single biggest gathering of AI Engineers, Founders, Leaders, and Researchers in the world. Attendees get >$5000 worth of sponsor credits and talk tracks are looking FANTASTIC. Join us!The AI scaling debate always focuses on the question of “how do we get more GPUs?” but the better question may be: how do we make the most of ones we already have.The fact that a frontier lab like xAI could be running at sub-10% MFU (Model FLOPs Utilization) is just a hint at what the real problem may be.For context, older frontier-scale training runs were already much higher than 10%. GPT-3 was around 21% MFU. Gopher was around 32%. Megatron-Turing NLG was around 30%. PaLM reached around 46%. And our guest Anjney says best-in-class MFU today is closer to 60–70%.It's not necessarily that xAI is uniquely incompetent (it's clear they have talented folks) but rather the priorities may be flipped in the GPU arms race.While GPU access is a bottleneck, simply increasing CapEx won't automatically translate to better models as frontier AI is increasingly a systems problem: scheduling, utilization, networking, kernels, frameworks, data pipelines, parallelism, cluster reliability, and the thousand small decisions that determine whether your theoretical FLOPs become real training progress.From building Discord's developer platform and backing frontier AI companies like Anthropic, Mistral, Black Forest Labs, and Periodic Labs to now building AMP's independent compute grid, Anjney Midha has spent years close to the real bottlenecks of AI scaling. In this episode, Anjney joins swyx at Periodic Labs to unpack why the AI race is not just about buying more GPUs, why 95% utilization would have been considered an outage at Google, and why the next era of AI infrastructure has to be more aligned, more efficient, and more responsible.We go deep on AMP's vision for a compute grid that makes FLOPs flow like megawatts, the difference between full-stack AI labs and horizontal pooling, why AI data centers need community buy-in, and how compute markets could evolve into something closer to an independent system operator. Anjney also explains why DeepMind's unpublished research points to a market failure, why end-of-life prediction remains one of the most important AI applications he has thought about for fourteen years, and why “output maxing” may become a new discipline for frontier systems.We also discuss Anthropic's culture, why “luck favors the prepared mind” in coding models, how Claude cracked coding, why too much capital too early can make AI labs fragile, what Periodic Labs is trying to do with science and superconductors, why great researchers can become great CEOs, and why Silicon Valley is both deeply missionary and deeply mercenary.We discuss:* Why 95% utilization was considered an outage at Google* Why AI infrastructure waste compounds at frontier-lab scale* Why “move fast and break things” does not work for AI data centers* How data center backlash, power grids, and community incentives shape AI scaling* AMP's vision for making FLOPs flow like megawatts* Why compute needs an independent system operator* How interruptible demand and dynamic prioritization worked inside Google* Why DeepMind research hoarding creates negative externalities* AMP's 1.2GW base-load ambition and the need for 6GW of spike capacity* Why end-of-life prediction could become one of AI's most important healthcare applications* Frontier Systems, output maxing, and full-stack alignment* Why APIs and abstraction layers become lossy as organizations scale* Superconductors, standards, and the dream of lossless systems* SF Compute, open protocols, and the future of compute marketplaces* Why non-NVIDIA chips can still benefit from NVIDIA's reference architecture* Trust boundaries and why chip startups need visibility into future model architectures* Why VCs often underestimate researchers as CEOs* Scientists as star athletes of the mind* Why great CEOs need to be confrontational up and down the stack* Why leading the frontier matters more than “winning”* How Anthropic cracked coding* Why culture is fragile, not a permanent moat* Why hardship was a feature, not a bug, for Anthropic* Why Anthropic's P0 was coding from day one* Periodic Labs, physics as the constraint, and technical reality* Silicon Valley mercenaries, missionary teams, and what happens after a breakthroughAnjney Midha* LinkedIn: https://www.linkedin.com/in/anjney* X: https://x.com/AnjneyMidhaAMP PBC* Website: https://amppublic.com/* X: https://x.com/amppublicTimestamps00:00:00 Introduction00:00:09 Why AI Compute Is Being Wasted00:03:17 Responsible Infrastructure and Data Center Backlash00:06:07 AMP Grid: Making FLOPs Flow Like Megawatts00:12:41 Foundry, Frontier Labs, and Research Hoarding00:14:42 Gigawatt-Scale Compute and End-of-Life Prediction00:24:08 Frontier Systems, Output Maxing, and Alignment00:27:38 Compute Markets, SF Compute, and Non-NVIDIA Chips00:32:57 Trust Boundaries, Co-Design, and Researcher CEOs00:38:17 AI Coachella and First-Principles Thinking00:42:43 Leading vs Winning in Frontier AI00:45:54 How Anthropic Cracked Coding00:48:25 Culture, Hardship, and Anthropic's P000:54:03 Periodic Labs, Physics, and Silicon Valley Mercenaries00:56:26 Rishi Valley, Singapore, and Money as a Measure00:58:47 Closing ThoughtsTranscriptIntroduction: Anjney Midha, AMP, and Compute WasteSwyx [00:00:00]: We're in Periodic Labs with Anjney Midha, CEO, founder of AMP. Welcome.Compute Utilization: Node Allocation, MFU, and AlignmentAnjney [00:00:09]: Thanks for having me. At Google, there are two types of utilization usually, right? That you're measuring in these clusters. One is node allocation, and then the other's MFU. Node utilization is usually like what percentage of cards in the data center are just, used, and that, if it's not at, 95%-Swyx [00:00:29]: There is no excuseAnjney [00:00:29]: There's no excuse, right? I think 95% at Google, which is where my co-founder, Seb, came from, he built the Borg, PBorg/GQM scheduler at Google, and there I think 95% was considered an outage, so 96% node utilization is, should be standard. And most single-tenant clusters are not running at that. So that's one. And then MFU should be, I would say the best in class today is somewhere between 60 and 70%. I think this is a leadership question, right? Fundamentally it's an alignment question, which is are the people who are funding the cluster and then deploying the cluster actually aligned? And sometimes theoretically they are, but in practice the number of people in the chain, the supply chain between, the capital and all the way to whoever's managing the cluster and then whoever's measuring what the output is, are just so many, degrees of separation away that, the, The Have you ever heard the radian metaphor, which is at the beginning of an arc, if you have two arcs that are two lines that are just off by a few degrees, that-Swyx [00:01:33]: It spreads outAnjney [00:01:34]: It spreads out, right? Or at scale. And I think what's happening is a lot of cluster implementations and infrastructure, a lot of frontier labs and other teams, that's what's happening, is they're, they initialize the plan, which is kind of like North Star with a team that wants to do good, but then they're, required to scale so fast instead of iteratively that the wastage just compounds really fast at scale. And so I think we know the answer, which is just do iterative bring ups. If you spend time with people who've been in the semiconductor industry or the DSN industry for a long time, this is not new, and I don't think AI should be an excuse. Sure. Something What is new? Okay. We have a lot of new capabilities, but that doesn't mean just abandon common sense. Common sense should always be in fashion. ? AI scaling doesn't change the in fact, if anything, AI scaling should be putting a premium on the value of common sense and infrastructure because the margin of error now is so much lower and the costs of wastage are so much higher. And the cost of wastage, by the way, is not just economic. I'm, obviously I'm, I'm an investor, or I'm an investor by background. Over the last few years now we're running an AI infrastructure business called, AMP. And I think that it's okay to say this time is different on the capabilities front. We are genuinely getting capabilities at, of the, of a kind we haven't had before. That doesn't give you an excuse to say this time is different for everything, especially infrastructure. So look, I love the hacker mindset and the hustler mindset. Now, that's great for the startup mindset, but you remember this moment where Zuck went from saying, “Move fast, break things” to, move-Responsible Infrastructure and Data Center BacklashSwyx [00:03:10]: Fast and stable infrastructureAnjney [00:03:11]: Move fast with stable infrastructure. I think now we need to move fast with, responsible infrastructure. People are going to ask where the impact is. There was a really In our class yesterday, Scott Nolan, who's the founder of General Matter, came by at Stanford to speak about energy bottlenecks. And he had a phenomenal idea. He said, “if you look at the marginal unit economics of compute per hour,” he goes, “let's call it, $4 an hour. If you're having to bring up a new data center in a new community, why not just say we're going to charge 4.50 an hour, and that marginal impact or that marginal increase, we just literally take that and give it to the local community as cash?” I can tell you as a customer of that compute, I would love that. I'd be happy to pay an additional 50 cents per hour at scale.Swyx [00:03:57]: Wow. Yeah.Anjney [00:03:58]: Because if that means the public benefit is so clear to the communities that the data centers are coming up in, I'm going to feel like that compute is much more reliable. Up to 20% of all data centers this year in the US, my understanding is are at risk.Swyx [00:04:13]: Of community backlash?Anjney [00:04:14]: Correct. Of not getting the community support they need to get brought up.Swyx [00:04:19]: Wow. That's a huge number.Anjney [00:04:20]: Yeah. Now, we, I think we should dig into what that number is. I think it's a little bit of overstated. These things can get over-reported, but it-Swyx [00:04:27]: They don't just care about jobs. They care about all the other stuff around it, right? They care about power grid, they care about environments-Anjney [00:04:33]: Power grid, permitting, and so on. And imagine I think if you said there's a new AI deal. If we're bringing up a data center in your community, we're actually going to reduce the cost of your electricity bill. Okay, now we're talking. Right? The community's going, “Okay. Now this is a deal. I feel like a partner in this.” Right now that's not happening. There will be audits, there will be investigations, and when the, when the regulators come, I don't know when it's going to be, the folks who are moving fast and breaking things in the name of AI progress better be prepared. That's certainly not how we're procuring compute. Or we're, we're trying as much as we can to work with partners who have long-term track records. Many of whom, by the way, are not, AI providers. I think this whole idea of neoclouds being somehow this new category is a lot of marketing speak. There are really good, reliable, trusted data center providers in America who've been around 20 plus years. I love those folks. They know how to Sure. Are they sponsoring happy hours at NeurIPS? No. Are they legibly listed in Build? No. Are they hanging out in my, in, situational awareness parties? No. But they're adults. I trust them.Swyx [00:05:44]: They can run LAN. They can run power.Anjney [00:05:45]: They can run LAN, power, and shell. They have credit histories. We sit down, we have a conversations. Many of them live in Silicon Valley. They've, they've had to deal with the boom and bust cycles of the internet, and I love those folks. They are stable infrastructure partners and thinkers. And I think there's a lot of short-term thinking going on in the compute layer, and it's going to catch up to us. It's not going to be good.AMP Grid: Making FLOPs Flow Like MegawattsSwyx [00:06:07]: You talk about aligning incentives, and, I would think that aligning incentives means you have the full stack in one company, which is xAI and OpenAI, right? So you as a standalone infrastructure layer, why are you somehow more aligned to your portfolio companies than people who just own the whole thing?Anjney [00:06:28]: In systems design, right, there's, there's two regimes of, architecture, right? You have integration, and then you have pooling and utilization, right? So the Or rather, the way to increase utilization often is you can do systems integration where you collapse a lot of process into one node, or you can pull out a process from a node and share that amongst various That resource amongst several different nodes. And so we see the AMP grid, which is, the, what, the system we're building here, which is basically a compute grid. We're trying to do for compute what the electric grid-Swyx [00:07:02]: PowerAnjney [00:07:02]: Yeah, what the power grid did for electricity. It-- this is a pooling and utilization layer across clouds, And so we're actually the opposite of a full stack integration like approach.Swyx [00:07:12]: Super horizontal.Anjney [00:07:13]: Where it's much more horizontal and it's, it's multi-cloud, it's multi-silicon. The goal is to try to make FLOPs flow like megawatts, and that is very hard to do today for many reasons. There's stranded pools of compute all over the place and there's no fungibility. And so right now we do it at the level of scheduling, and we often do it at the economic layer. But as we start to announce what we're working on, it's extraordinary like how many folks are coming out of the woodworks and saying, “Hey, I'm actually working on a way to make compute fungible at this part of the stack and that part of the stack.” And as a grid, we'd like all of these folks to participate on the grid. There's, people often ask me, “Andra, are you a new cloud?” And I go, “No, actually neoclouds are suppliers.” sometimes they'll ask, “Are you a venture capital firm?” I go, “No, actually they are, they are demand like sort of off-takers of the grid.” We see ourselves as what's called an independent system operator. So if you study the history of the electric grid, once it became legible to a lot of factories and industrial sort of participants that, hey, actually it turns out pooling is a good idea. We should pool our generators instead of all having a generator running at half capacity in our backyard. There was a need for an independent entity who could coordinate all these parties. Transmission line, power generation, facilities, transmission lines, factories, and that neutral coordination mechanism is very critical. In order-- If you study like the history of grids, the most enduring ones were those that never owned their own assets. They were ones that had, or often started with long-term anchors who are uncorrelated sources of demand, a steel factory, a shoe mill or whatever in a particular town who weren't competitive, where the steel factory want to spike up at night, the shoe mill wanted to spike up during the day. So then you pool and you share, right? So each of you is guaranteed some base load, but then you kind of schedule your spikes to drive a peak utilization across the town. The gold standard, so to speak, historically, has been these utility companies like PJM Interconnect in the northeast of America, where they, over many years became this what's called an ISO, an independent system operator of the grid. So that's how we see ourselves. Economically, that's what we are. From a technical perspective, we started at the scheduling layer because Seb and Mihai, who, run engineering here, built that at-Swyx [00:09:28]: Did your schedulingAnjney [00:09:28]: They did that at Google. And, -Swyx [00:09:32]: And you have infra shops from Discord as well.Anjney [00:09:35]: I have some.Swyx [00:09:35]: I don't know, I don't know if Discord is like the primary identity, but what-whatever, I'm just kind of-Anjney [00:09:39]: No, D-Discord was-Swyx [00:09:40]: Choosing a well-known name.Anjney [00:09:42]: Well, I So I was running the developer platform there. The internal infrastructure I was not responsible for. That was actually a guy by the name of Mark Smith, who was extraordinary. And yes, Discord did pool So Discord is actually a counter example. I had the chance to learn a lot about fully, full stack infra there because-Swyx [00:09:56]: It's the same thing, yeahAnjney [00:09:57]: It's the, it's the other architecture which is, Discord built its own WebRTC vo-voice and video infra. So like Discord did not use-Swyx [00:10:08]: For the calls, yeah.Anjney [00:10:09]: Yeah, did not For communication, Discord did not use third party infra. It was all built in-house. And then the way you maximize utilization was you pool demand from the world's 200 million plus monthly active gamers, right? And so that's, that's how those stacks were constructed. Again, in systems design, the two concepts that keep coming up over and over again are abstraction and composition, right? And-Swyx [00:10:31]: Bundling and unbundlingAnjney [00:10:33]: Bundling and unbundling, abstraction, composition, like verticalization and-Swyx [00:10:36]: HorizontalAnjney [00:10:36]: Horizontalization. So in that sense, AMP is an independent system operator of the grid. We pool demand, we pool supply from a number of partners we trust At about 1.3 gigawatt scale over four years. And then we pool demand from some of the world's best, research labs and so on. We're sitting at one, periodic labs who need extraordinary long-term demand. And the idea is that, each of them is guaranteed base load on the grid, but they can spike up and down flexibly on, for compute, with much shorter timelines as needed. That was roughly the design of the program I came up with at a16z called Oxygen. The same-- That was the same design of the GQM, BorgX, Borg GQM implementation at Google that Mihai and Seb had built. Which was that how do you allow, teams inside of Google, on the internal infrastructure to be guaranteed capacity, for their base workloads? But when they need to spike up on research, how could they ensure that was sufficiently there? And of course, the big innovation that was not discovered, but kind of implemented in the space, this infra space maybe three, four years ago at Google was the idea of interruptible demand, right? Where you just queue up a bunch of jobs and through this like sort of credit system, there can be a bidding mechanism.Swyx [00:11:53]: Like priorities.Anjney [00:11:54]: It's a dynamic prioritization Basically. And jobs can get interrupted based on somebody else who's saying, “what? I have 10 tokens, 10 credits I want to spend on this job.” Another like team lead, research lead is “Genie 3 or whatever is only worth five, credits, and NanoBanana2 is worth 10 credits,” and so the NanoBanana job gets priority. That's a, that's a made up example.Swyx [00:12:15]: It's very real. Brain Marketplace was real. And, we've, we've covered this on the pod with David Luan, who was-Anjney [00:12:20]: Oh, great. OkaySwyx [00:12:20]: Was there. And the criticism is that, well, actually sometimes you need central command to go all in on a thing. And actually sometimes capitalism via credits doesn't work. Not, this is not a criticism of AMP. I'm just saying, this is a thing that has been tried, internally within Google, and it led to Google missing GPT.Foundry, Frontier Labs, and Research HoardingAnjney [00:12:41]: Like, we structured ourself essentially very similarly to Google. We are structured as a holdings company. So, Alphabet holdings is Alphabet holdings, and then they've got these subsidiaries called Google and-Swyx [00:12:51]: Other betsAnjney [00:12:52]: Other bets and so on. We've got, AMP holdings, and we've got our infrastructure business, and then we've got a capital business called Foundry that incubates new frontier AI labs or invests in them as venture capital, like Periodic. We put a few hundred million dollars into Anthropic from our fund earlier this year. So wherever we feel like teams are making progress, especially researchers and so on who've pushed the frontier inside of existing labs like DeepMind, I find, there comes a point where they feel misaligned with the dictatorship of Alphabet holdings. And at that point, sometimes the dictatorship doesn't want them anymore. And they're “Thank you. You've done your job here. You've kind of helped us through the zero to one phase, and for whatever reason, we're going to deprioritize your amazing, omni model or whatever it is, and instead we're going to prioritize coding.” And, I think that's a tragedy, but I get it. They're Sergey and team are running their own business there. But that doesn't mean we the rest of us should sit around waiting for that progress to get unlocked for the rest of the world and humanity. If you think about how much extraordinary research has happened inside of DeepMind over the last 10 years, I, Demis and Sergey and those guys did such a great job. But at the end of the day, so much of that has never seen the light of day?Swyx [00:14:00]: Or they're like papers only, but they never actually shipped it to production or-Anjney [00:14:03]: What's worse is the paper is actually not even being published anymore ‘cause there's a six-month embargo inside of DeepMind, right? We've heard about this where a paper comes out, and then I think there's a six-month embargo window where if anybody on the business team says, “This could be interesting” It's embargoed for life.Swyx [00:14:18]: Exactly. So the stuff that gets published is the stuff that's not good enough.Anjney [00:14:21]: There's an adverse selection problem, basically. Yeah. At this point-Swyx [00:14:25]: It's, it's a common complaint at NeurIPS, by the way, that's “Well, why would I look at the papers that are the trash of GDM?”Anjney [00:14:31]: Again, I think it's a tragedy. I get it. They're running their business, but the rest of the I think there's negative externalities of research being hoarded, and so that'there's a market failure. And somebody needs to unlock that research, and we can't do it on our own. We only have 1.2 gigawatts of compute. That's nothing. That's about $40 billion of cloud spend. We're going to need a lot-Gigawatt-Scale Compute and End-of-Life PredictionSwyx [00:14:51]: By the way, is that's a new number. I haven't, haven't come across that gigawatt number. That's huge.Anjney [00:14:56]: Yeah. And to be clear, we haven't secured all of it. That's how much demand we have started to secure. I think publicly we haven't actually confirmed how much we have for this year. In order-Swyx [00:15:04]: Where do you want to get to?Anjney [00:15:06]: I think the steady state would be that we have a base load pool Of 1.2 gigawatts at all times Of base load capacity. For spike capacity, right now my estimate is we need roughly six gigawatts over the next four years for all our teams to feel like they were able to keep moving the frontier, whatever they're working on, whether it's, like superconductor discovery over here. There's a new investment we're working on right now, which is in the end of life prediction space in healthcare. It's extraordinary how much you can, you can give this was actually my graduate school work. I went to grad school for bioinformatics at Stanford Med. And I know we-Swyx [00:15:40]: Econ, MCS, bio.Anjney [00:15:41]: So my-- I was this really weird cat where, I was never satisfied with my major options. So at one point I was an econ major, then I was a CS major, then I was a MCS major called mathematical computational science, and they decided they were going to end that major. So I took all that coursework, and I applied it to grad school, my graduate degree in bioinformatics, which was the master's program, and then I thought I was going to do a PhD. I never ended up doing it. I dropped out and went to work at Kleiner. But I was lucky enough to apprentice with this professor at, Stanford Med. His name is Nigam Shah, and he was working on end of life prediction. Stanford is one of the only research facilities in America that has a longitudinal patient data set that's larger at scale. I think it's at least 12 million patient lives. The only larger data set is at the VA, the Veterans Affairs, of America. And to do research, like do any deep learning and so on that data set, it was called the STRIDE data set at that time, you had to be a Stanford Med School affiliate, which is why I went and enrolled in the bioinformatics department. End of deep learning was early. Nigam Shah had the visibility-- the vision to see that, you could do end of life prediction to help palliative care. In America, the, over 30% of all Medicare, Medicaid spend, at least at that time, was spent on end of life care. And what's we grew up in Asia, so we all-- Yeah, at least I won't speak for you, but I have A very different relationship with death than I find folks who grew up in America do. In America, spiritually and culturally, especially in Western societies where Christianity, the Christian tradition sort of frames death as this terminal point, there's often a judgment day and so on. The way we view death is with a finality. In Indian culture, in Hindu culture, death is one-Swyx [00:17:35]: Also, he's Buddhist as well.Anjney [00:17:36]: You're Buddhist, yeah. So it's one, it's one step in a journey of many lives, right? And so, I grew up in this city called Chennai in the south of India, and when people die, you dance on the street. There's like a procession where your body is carried to be cremated and your family, like celebrates and there's drums and so on. It's this huge thing. And, It's because the idea is that you're going to be reincarnated. You've been liberated from the responsibilities of this life, and now you're onto your next. It's a new It's like going off to a new college or whatever, right? And so it was so alien to me when I got here as an undergrad- That the medical system works backwards from that assumption that we have to view death as this terminal thing and delay it, postpone it's a bad thing. And so at the time, clinical decision support in the United States was this very primitive field. Even to this day, physicians in the United States often will tell you when you have a terminal disease, this is your, we've diagnosed you, which is great. Our ability to diagnose you is extraordinary. You have somewhere between six months to six years to live. What do you do with that information? The error bars are so high that then you In times of uncertainty, we default to culture, and when the culture is let's-- this is a bad thing, I've got to prolong my life, then you start doing things like And just to, just sort of from a systems perspective, what's going on there is Physicians often feel like they need to provide such high error bars because there's always some uncertainty in end of life diagnosis, and if you provide the wrong Diagnosis or recommendation to your patient, you can be sued for medical malpractice. And then your license can be taken away. It can be catastrophic for your career. In contrast, if in countries where that's not the case, what you often observe is that patients, physicians are quite prescriptive with their recommendation. They say, “Hey, this is your condition. The literature says that you probably have this much time on Earth left. My expert opinion is that you are an outlier or whatever.” And they try to be more prescriptive, and that empowers a patient, right? ‘Cause then a patient can say, “I trust my doctor. They said on average, I have six months to live, but if I do these things, I may have a shot because of my particular predispositions or my genetic history or whatever.” And that empowers you to go about your life in a actually more scientific way than leaning on religion, culture, spirituality, and so on. In contrast, here, because of that medical malpractice sort of thing looming over your head, a physician never gives you a clear recommendation. So instead you say, “Okay, Doc, well, let's try it all.” And then you start a whole regime of drugs and therapies, and then you often spend weeks and weeks in the hospital, and that deteriorates your quality of life. And when that deteriorates your quality of life, you instead of spending your last few days doing the things you love with your family, you're spending it on a hospital bed. And that ends up being thirty percent of Medicare and Medicaid. So it's worse for the patients. The doctors feel terrible. The American taxpayer is paying a huge amount of money. And so this is why Nigam Shah, who was this professor at Stanford, said, “Anjney, if there's “ I kind of sat down with him. I was this young, I'd, I was twenty-one, and I was “I want to work on a big problem.” He's “The big problem is end of life care.” And so we tried to do deep learning to say, to-- So we started trying to run deep learning on these tried patient data sets to say, “Could you have an AI system make a recommendation that is orders of magnitude more precise about how much time you have left once you've been diagnosed with a terminal condition than a human?” And then if we can get that precision to be high enough, then you can empower the patient. And it turns out the tech works. Like it's-- Once you get the data set, like RL works. Honestly, even regression models work. You don't need to get that fancy. At the time, we were just trying, doing like very simple neural nets.Swyx [00:21:54]: Simple solutions, yeah.Anjney [00:21:54]: Today, what we can do with RL is extraordinary. The problem remains then and now is regulatory, because you actually can't shift the burden of the wrong clinical diagnoses from the physician to the AI system. And so at that time, I got quite disillusioned ten years ago for, twelve years ago where, ‘cause I felt I just didn't have the resources to influence regulation. Today, I'm very lucky. I'm in a different place. I've, I'm a lot older, and so I've been spending a lot of time on my next incubation, which is how can we unlock the, patient empowerment by training AI models to do end of life prediction much, with much more precision and ac-Swyx [00:22:37]: Oh, wow. You're still focused on this the whole time.Anjney [00:22:40]: The-- I haven't been able to get, this out of my mind a single day for the last fourteen years. This is the hill I want, I would like to die on. There's two, I would say. What? I actually, I'd prefer not to die.Swyx [00:22:51]: Yeah, exactly.Anjney [00:22:52]: But I think two bipartisan issues, I think two issues that should be bipartisan in America are how do we empower patients to make the right clinical decisions at the end of their life, such that we're reducing the taxpayer burden with science? It's just good old science, and AI can help here. And the second is, net positive data centers, ‘cause I think that's the biggest critical bottleneck on training and good enough AI models to help people at the end of their life. So there's sort of two sides of the, of the same scaling bottleneck curve, but those two, we formed AMP as a public benefit corporation. My wife and I, who you've met, you've met Viv. Her passion is education. Her family is a long line of educators and so on, and, of physicists. And so this class is my attempt to stop being the black sheep of the family and be a, an educator. But if I'm not educating, the thing I would be doing is working, on these two problems, whether on the political spectrum or as a researcher back at, in some lab. And my hope is if anyone's listening to this podcast, if they're passionate about either of those two topics, I'd love to hear from them. We'll, we'll we can share the contact in the show notes, but, we're looking for people to join both of those missions on the, on the political side as well as on the medical side, on the research side.Frontier Systems, Output Maxing, and AlignmentSwyx [00:24:08]: You said, this is a discipline that you want to form. You call it's called variously called Frontier System. It's variously called One Person Frontier Lab. What is the ideal name or shape of this? Like the, what is the mission?Anjney [00:24:24]: Of the class?Swyx [00:24:26]: Of the discipline that you're, exploring, right? I The class is called Frontier Systems. But like for me, maybe one phrase is you're, you're just anti-waste, right? Which is wasting GPUs, wasting in human and Medicare. But is there, is there a broader theme that I'm, that maybe you can encapsulate more succinctly?Anjney [00:24:45]: Yeah. The, from an engineering perspective, it's very simple. It's output maxing. It's the, it's the department of output maxing.Swyx [00:24:51]: Making the most of what we have.Anjney [00:24:52]: Exactly. I'm a huge believer in optimal outcomes. I think both in America and other countries, we are losing our appreciation for nuance, and this is the thing of And AI is the same case, right? Oh, the bitter lesson holds. Okay, fine. But that doesn't mean you just like throw 500 GB300, 500,000 GB300s at your suboptimal model scaling and you waste a bunch of compute. It also doesn't mean that, the most optimal is to have like 50 different architectures where there isn't enough standardization. One of the reasons Anthropic has had extraordinary sort of velocity is ‘cause they picked the transform architecture and said, “This is simple. Let's double down on it,” right? And now luckily there's enough investment going to the space that we can afford other architectures, but at the time, investment was just too fragmented into other architectures, so that arguably unlocked scaling. So I think there's a philosophy. I think we all owe it to ourselves to do output maxing with a new capability called AI on a global level. I think if I was starting a new department at Stanford, depending on how fuzzy or technical I wanted to be, I'd probably call it the Department of Alignment. Like-Swyx [00:25:59]: It's an overloaded termAnjney [00:26:01]: But it is, But alignment really Is a hard problem. And I think when you unlock it, full stack alignment is super hard in any organization and in any system. Like in a, in a venture capital firm, if you can have full stack alignment between your limited partners and your, the founders who are creating the value and ultimately the public that owns the IPO stock, that is a gift that keeps giving. And when you study the history of these systems, when they start off, they usually start out small scale where the feedback loop is actually so tight that there's alignment. And then the more you try to scale, the more division of labor happens, the more specialization happens, and at each step you add abstractions. And wherever there's an API interface, there's like loss. There's communication loss. And so I think a really cool thing would be for us to figure out is there a way for us to have our cake and eat it too as an engineering discipline? Is there a way to actually scale up and scale out Without losing any alignment, without lossy transmission?Swyx [00:27:01]: You mean standards?Anjney [00:27:02]: So standards is one way. The other way is you just have net new capabilities. So like what we're trying to do here is discover new superconductors. A room temperature superconductor would be a lossless transmission mechanism for energy. We would have flying cars. We are right within a few years of having a new room temperature superconductor. So I think those are the two. You either have to standardize On protocols or API specs that allow lossless communication, or you can come up with a whole new capability that unlocks so much abundance, the standardization doesn't matter ‘cause you just unlock net new capacity. This, the, so this is what I spend my days thinking about these days.Compute Markets, SF Compute, and Non-NVIDIA ChipsSwyx [00:27:38]: No, I think every infra person at, who wants scale and wants to output max does eventually end up thinking about this. We don't have time to go into it, but we have done an episode with SF Compute-Anjney [00:27:50]: Oh, coolSwyx [00:27:50]: That is trying to standardize The futures contract for compute. I don't, I don't know how that's going by the way, but like at some point this will be public.Anjney [00:27:57]: Oh, I think Evan is awesome and SF Compute is the kind of effort that I hope we can accelerate because what often happens is these exchanges are very hard to get, they, it's hard to bootstrap them, right? Because they often require-- There's many inefficiencies between parties. There's trust boundary inefficiencies in infrastructure because you don't trust, one part of the stack doesn't trust another part of the stack to give them visibility. There's capital markets inefficiencies, there's operational efficiencies. So if you can inject like a single shock to the system of a ton of compute demand or supply, then you can accelerate, these new flywheels. And so my hope is one day, or soon, if SF Compute needs extra like has excess capacity, they just hook it up to the grid and they get flooded with demand from us. And on the other side, if they have a ton of demand but they don't have supply, they just again hook up to the grid and it's a two-way protocol where they can just hook up to our capacity. And I don't think we're too far from that. Today our working implementation of it is mostly through a group of labs, universities, and a few sort of trusted parties who are, who all feel like they're in alignment to borrow an over sort of used word. But our hope is to just have it be an open protocol that anyone can hook up to on-Swyx [00:29:20]: Hook up for demand or hook up for supply? In primarily demand, it sounds like. Like you-Anjney [00:29:25]: No, bothSwyx [00:29:26]: You would want to offer demand.Anjney [00:29:27]: Both. Yeah. Unfortunately, what's happened in the last six weeks is, we thought we'd have a bunch of excess capacity by the end of this year. It's all gone.Swyx [00:29:37]: It's exploding.Anjney [00:29:38]: It, yeah. It's all gone. And so I have, my text messages are full of friends, we know many of these people, these are founders who've raised billions of dollars in San Francisco going, “Oh, any chance you have like 50 nodes in the next few weeks?”Swyx [00:29:51]: What is the scope for, non-Nvidia, right? You have Lisa Su coming and, Rainer Pope as well. And so There is a lot of demand for, more performance Alternative architectures and all that. At the same time, this hurts your standardization.Anjney [00:30:11]: I don't think so. So actually Rainer's a great example, right? Rainer is a CEO and founder of, MatX. I actually had him by for office hours in the class earlier today, and there was an insight he brought up that I hadn't considered before, which is when they decided to pick the standard For their data center, they picked the NVIDIA reference architecture. So the MatX chips Just plug in to any site that has an NVIDIA bring up planned. And, the-Swyx [00:30:42]: It's just software then. It's, it's not the-Anjney [00:30:44]: A-Swyx [00:30:44]: Hardware.Anjney [00:30:46]: Well, from an input and IO perspective It's the same footprint as an NVIDIA rack.Swyx [00:30:52]: That makes sense.Anjney [00:30:53]: Where they have done, innovated a bunch from what I can tell is on systems co-design. Which is where a lot of the gains are to be had. And so he picked He was “Anjney, we, there's just so much work to do when you're building a new chip company.”Swyx [00:31:08]: Can't fight every front.Anjney [00:31:08]: You just can't fight on every front. So my question to him was, “Well, you're working on this new chip. Their tape-out is next year. What, who are you going to partner with to host the chips?” And he said, “Whoever will host them. That's just not, that's not my focus.” And I said, “But how did you “ you decided back to our earlier systems design question, he decided that, he didn't want to be a full, fully integrated chip provider. The bottleneck they're focused on is the logic die, and they, he feels they can crank out a ton of performance gains through co-design there. But then that means you delegate, to our question earlier, it, you he's the data center provider is a different part of the stack, and so then he's dependent on that part of the ecosystem to host his chips to get the performance gains to the customer. So now you have another abstraction, and you might have loss. So I asked him, “How do you prevent loss?” And back to your point, he said, “I just picked the NVIDIA standard ‘cause I didn't want to Like I wanted to piggyback off of an existing protocol.” And that, what's great about NVIDIA is that reference architecture is known.Swyx [00:32:15]: Open.Anjney [00:32:15]: It's open. They've published it. So Jensen's actually enabled someone like Rainer to build a chip company like MatX, and I don't see them as competitive. The compute demand is so high. Like, I don't I think NVIDIA's not able to meet the demands of production, so we just need more chips. And I think it's very smart what MatX has done, which is say, “We're just going to we're not going to innovate on the data center design ‘cause actually, thank you, Jensen, you've done all the hard work. Where we can innovate is somewhere else.” And I think that's, that's very healthy. I think that's how we unblock new bottlenecks. And my view is these, the, chip teams like MatX, who have arrived at the insight that co-design is the way, The primary bottleneck for them is trust boundary. To do co-design well, you need visibility into the next model generation as soon as possible ‘cause it takes two years to tape out. So if by the time I bring my chip to market, your model architecture's changed, I'm host. Now, when he was inside Google, he was sitting next to the Gemini team. He was on Palm or whatever.Trust Boundaries, Co-Design, and Researcher CEOsSwyx [00:33:19]: His co-founder was the, was one, was one of the Palm guys, I think.Anjney [00:33:23]: Yes. Yes, exactly. So when you're inside the trust boundary of Google, then your systems co-design loop is super tight. When you leave as a founder, one of the biggest risks you take is now you're outside the trust boundary. And so what I love doing is helping chip teams who can help us unlock more capacity for the independent ecosystem access to trust. Because when I If I've been, involved with a lab from day one, and I was lucky enough to work with Anthropic, and then I'm on the board of Mistral and helped Black Forest Labs get started. I think at this point I'm on six or seven different teams.Swyx [00:33:57]: Only six? I feel like my mental number was going to be 13, but yeah, it's-Anjney [00:34:02]: No, I go deep with one at a time.Swyx [00:34:04]: You're founding CEO of Arena.Anjney [00:34:07]: Nah, that was an, that was an-Swyx [00:34:08]: Administrative CEOAnjney [00:34:09]: It was an administrative five-month gig where Whalen and Anastasios were graduating from their PhDs, and they didn't need a product team. So I helped recruit the head of engineering product and design. But Anastasios has always been the CEO of that company. I played a pinch-hitting I'm an intern. I was CEO intern For five months. -Swyx [00:34:33]: I interviewed him, and he's he's very well-spoken. I think he's a debate, former debate, champion. But also very quantitative and mathematical, which is-Anjney [00:34:41]: He-Swyx [00:34:41]: Such a unicorn.Anjney [00:34:43]: See, what's amazing about him? If you look at his output, he's an output maxer. By the time he was graduating from his PhD, which he only graduated last year, he had published more work with a citation count than, people twice his age. But at the same time, he'd already started a project called LLM Arena that was being used by millions of people As a side project. And time and time again, what I've realized is venture capitalists suck at seeing human beings as, dynamic agents where-Swyx [00:35:14]: They want to put you in a boxAnjney [00:35:15]: They want to put you in a box.Swyx [00:35:15]: This is your thing.Anjney [00:35:16]: So the first time I got introduced to Anastasios, somebody had told me “Oh, he's amazing, but he's a researcher.” I was “what? What do you mean he's a researcher?” That's what-Swyx [00:35:28]: Like he's not a CEO, not a founder.Anjney [00:35:29]: Not a CEO, exactly. I was “Are you crazy? Do you Have you met Dario?” Dario's a scientist. He's gone from zero to, what will soon be a trillion-dollar company in four years. Being a CEO, nominally speaking, is not that hard. Being a good CEO is hard. Being a great CEO actually requires a level of performance that scientists who have already published at the top of their field have accomplished. It is super hard to be a competitive scientist. To publish in academia over the last 20, 30 years, to make it to the top of your discipline at a place like Berkeley, you are a star athlete. Like, you are an athlete of the mind, and you perform at the highest levels. And to get there, whether you're, Anastasios or Whalen at Berkeley, or you are Robin, who-Swyx [00:36:23]: BFL, yeahAnjney [00:36:24]: With Black Forest, who created Stable Diffusion, or if you're, like Guillaume at Meta, who created Llama before he started Mistral. The amount of human leadership you have to demonstrate to get the resources, like get the trust of the organization, publish it, put it up. I would just fund researchers all day Right? If who have contributed already to the field. If they've, if they've put SOTA out there, they're, they're star athletes already. If they haven't done SOTA Look, they can still be good CEOs, but then I find the failure mode is that they just don't want to be CEOs, they primarily want to publish, and that's okay, too. One of the things we do with the AMP Grid is we donate excess compute. We have two nonprofits, like university labs. We carved out like a couple thousand H100s. But I do think there's extraordinary research being done on university campuses. My father-in-law's a physicist. He's a professor. Extraordinary work in physics, and we need that. But if you want to be a CEO, what you need to be willing To do is be super confrontational, outside of science. Like within the scientific community, some of the best researchers are very confrontational about their convictions, right? This architecture is right. To be a great CEO, you basically have to be willing to be confrontational up and down the stack.Swyx [00:37:41]: To your own team.Anjney [00:37:42]: To your own team-Swyx [00:37:43]: To customersAnjney [00:37:43]: Hiring, recruiting customers. Well, I would say, Yeah, pretty much to everyone Everybody. Of course-Swyx [00:37:50]: I see, I feel a little bit of that in my own work, but yeah, I can't imagine the stakes that Dario has had to go through. It's, it's pretty insane.Anjney [00:37:56]: No, I don't think the stakes are that different From how you're feeling it, right? Stakes are personal scaling vectors, right? The stakes that seem so low to you, like having this podcast where you can talk to somebody and just have a you're an extraordinary communicator, right? Like already in this conversation, you've pulled more out of me than most people, and I've been on 12 podcasts in the last two weeks.AI Coachella and First-Principles ThinkingSwyx [00:38:17]: I think I, we've just seen each other enough that there's some base trust.Anjney [00:38:20]: There's base trust.Swyx [00:38:20]: And I think, and I know that you, that I've done my homework and like I know that trust is a big deal for you, so.Anjney [00:38:27]: I think trust is about consistency, and you and I have seen each other In the community for years, right? Like, I remember the first time we met was at NeurIPS in New Orleans. I don't know if you remember that, luncheon.Swyx [00:38:38]: Oh my God.Anjney [00:38:39]: Reiko had set up this Reiko's amazing, and he set up this luncheon and-Swyx [00:38:43]: Yeah, I was “Who's this Discord guy?” I'm “Okay.” But-Anjney [00:38:45]: No, you weren't-Swyx [00:38:46]: You were just “You made some investments.”Anjney [00:38:47]: You were much less polite. You were “Who's this VC?” You're like-Swyx [00:38:51]: No, I Was I? Oh my God.Anjney [00:38:53]: It was-Swyx [00:38:53]: I'm so sorryAnjney [00:38:53]: It was visible on your face.Swyx [00:38:54]: I'm so sorry. But you weren't, you weren't The introduction was bad. I was I didn't know who you were.Anjney [00:39:00]: The, see, this is the thing about context, right? Like, but then I think I heard your accent. And I was “Are you-”Swyx [00:39:06]: Singapore, yeahAnjney [00:39:06]: “Are you Singaporean?” And you're “Yeah.” And I said, “I went to high school, JC, in Singapore.” And then the ice broke. But This is the there are in the scientific community, sometimes the stakes are very high for people who haven't had the emotional, what is called EQ Coaching and mentorship, right? Which is like to have scientific impact, you often need to be a extraordinary emotional, like emotionally in tune person with the folks you're trying to influence. And so what comes so naturally to you is actually a super high stakes thing to other people. And so I wouldn't assume that Dario's more stressed out than you. These things are you'd be surprised how similar and small sometimes the problems are to you That some of the world's biggest, leaders are facing. And that's what I've learned from this class. The guest speakers are Sam, Satya, Jensen.Swyx [00:40:01]: AI Coachella.Anjney [00:40:02]: Yeah. It's AI Coachella, right? So we got to get all the headliners, and they're I'm very lucky that some of these people have either mentored me over the years or I've done business with them. And when you, take the performative stuff out and any assumptions you may have about these people that you read in the press or on Twitter, We're all just humans. We're all trying to get along. And what's so special about this moment is AI is forcing, like scaling, the bitter lesson is forcing a lot of people to revise their assumptions for how the world works and go back to first principles or go and educate themselves. So the kind of people I was, I won't name who this person is, but I was at an event last week in Texas and, ran to somebody who said, “Anjney, I came across the class. What do you think about real time action prediction models?” And I was, don't know how happy it made me feel when they asked me that question. I know they've done the work. They've challenged themselves. I'm, they didn't ask me, “What do you think of world models?” They said, “What do you think of n-”Swyx [00:41:04]: Real time action predictionAnjney [00:41:05]: “action, real time action prediction models?” World models, don't get me wrong, are cool and everything, but you and I both know that is a layer of abstraction that is sometimes not usefully precise enough. Right? Ours-Swyx [00:41:16]: There's like four different kinds of world models.Anjney [00:41:17]: Yes, exactly.Swyx [00:41:18]: We've done the part with general intuition, by the way, which is very focused on, -Anjney [00:41:22]: Oh, cool. Yes. I love Pim. Pim is great. And this is what I love about people who've done that level of work. They realize they're not in competition with people who the rest of the world thinks they're in competition with.Swyx [00:41:34]: Because they're not in the category, they're in the specific thing they're trying to do.Anjney [00:41:37]: They're focused on their mission, and they have a systems understanding of the bottleneck they're trying to solve. And when somebody else says, “I'm working on real time, action prediction models too,” Pim goes, “Oh, I love that person. I want, I can learn from them.” But the minute they're “Oh, that person's a world model person,” it's “like which type of world model person?” But mostly they're just trying to figure out if it's a waste of their time, because we don't have enough time. So, Pim, for example, is super, loves this other company I work with we've talked about called Black Forest Labs. And he's mentioned to me multiple times that he's so, He thinks what Flux is doing is really cool. Andy Blattman came by and spoke in the class. And what I find over and over again is for people who do the work, who can be usefully precise enough about like what is actually going on in the world of frontier research, The sense of camaraderie is still well and alive, but it gets lost sometimes when you have to like abstract The technical complexities in, business terms And then the VCs are “How are you different from that world model?” I'm going to say Where do I even start to explain this stuff? And then the misalignment creeps in.Leading vs. Winning in Frontier AISwyx [00:42:43]: This is good. Yeah, I think, people listening get a sense of, what it is like to operate at a real level, like yourself, rather than at, the journalist level, where you have to sort of put everyone in, a rough category and create a narrative of competition, and who's winning today, who's behind.Anjney [00:42:58]: It-- this idea of winning is so Weird to me.Swyx [00:43:03]: You do want to win. You want you want competitiveness.Anjney [00:43:06]: No, I think you want to lead.Swyx [00:43:07]: You want SOTA.Anjney [00:43:07]: No, I think you want to lead. Yes, so you want to push the frontier. You want to push the SOTA. You want to do something that hasn't been done before. You want to capture value, but you don't want to capture so much value that, people think you're unaligned with your mission or trying to do what's best for the world. You want to capture enough value that you can keep innovating, right? And I think that people want to lead, they don't really This idea of winning and losing, again, I love Jensen. He's a, he's a leader. The mindset that he talked about on Dwarkesh's podcast, right? He's “I didn't wake up with a loser mindset.” I think that was awesome, right? Because he's, he's an engineer. Dwarkesh has done the work. So there's at least-- even though the, to me, it was very obvious they're talking about the same thing, they just passed each other. They just had to basically, Jensen has this, five-layer cake abstraction of how the industry works. And Dwarkesh had, I think from that podcast, had more of, a pre-training, mid-training, post-training systems loop concept.Swyx [00:44:04]: It's just a factor of who he talks to, right? Again, it's very clear.Anjney [00:44:06]: It's the systems It's the abstraction, the mental models, the It's the whole-- Dude, so much of the problem in the world is reasoning by analogy. And then the assumptions that are held invisibly.Swyx [00:44:19]: Yeah, I've, I've said, this is actually the best time in human history for first principles thinkers. Because everything you think will happen is actually now coming true.Anjney [00:44:28]: Correct. And the venture capital community is, notorious for this, where people look-- In times of uncertainty, they, cling to axioms that ended up being true from the previous era, and they kind of like proclaim them with confidence as if they're truths, but they're not. And it's very important to see the distinction between a heuristic and an axiom. An axiom can be proven-Swyx [00:44:55]: Like from internal consistency point of viewAnjney [00:44:56]: With internal consistency. A heuristic is a way you kind of a shortcut. And my God, the number of people I have had to put up with over the last few years who proclaim-- use heuristics As axioms to judge people, to judge which companies are going to succeed or the number of people who are “Oh, yeah, Anthropic, they're just training models right now,” but this one continue.Swyx [00:45:22]: Because that's a B2B SaaS?Anjney [00:45:23]: Yeah, the, like Which over the fullness of time, if you squint at it, maybe. But the way you arrive there is so important that you can-- you just, you can dismiss people. Here's what happened, right? What happened is Anthropic basically achieved takeoff in October of last year. That training run-Swyx [00:45:41]: Whatever, three seven?Anjney [00:45:42]: I forget the numbers now, but whatever that checkpoint was-Swyx [00:45:45]: We saw the cognition.Anjney [00:45:46]: Yeah. Right? You probably-- The, to those of us in the community, especially once post-training was done and it was released in December-Swyx [00:45:52]: Yeah. Can I sneak a sneaky question in there? I don't know if you have a perspective, maybe you don't, I just The number one question is how did Anthropic crack coding, right? Because Claude One, Claude Two, okay, like it was part of it, but it wasn't a big deal. And the leading hypothesis, it's a lucky dice roll that was then compounded, right? Like it was like Mildly better, but then they saw it and they were “Okay, let's really invest.”How Anthropic Cracked CodingAnjney [00:46:17]: I had this very annoying teacher. I went to this boarding school called Rishi Valley in India, which is like this, bird preserve. It's like three hundred and fifty acres of bird preserve in rural India, and there was no technology for seven years. There was this teacher, I won't name them, but they would have this-- I hated it every time he said this to me. He was “Luck fa-favors the prepared mind,” which is like a common saying, but the way he delivered it, always grated me, ‘cause he was always I was always one of those kids who got, a good grade without trying very hard. ‘Cause like high middle school is not that hard if you, if you're generally, paying attention and so on. And there was this one time where I-- But then I would get an eighty percent grade, and he would keep pushing me to say “The reason you didn't get the ninety-five plus percent is because you're not that lucky.” And I would say, “What do you mean?” ‘Cause I would think that I deserved that grade, and I would sometimes argue with him. And he'd say, “You didn't have a prepared mind. If you want to get lucky again “ There was basically one time where I got like ninety-five or ninety-six on this, on this subject, and I, now that I felt entitled. I was “Okay, I'm going to keep doing this,” and I didn't. And then he was “Luck favors a prepared mind. You got lucky last time, but you got to stay prepared.” And I didn't understand what he meant. Now, as I'm older, I'm okay, these adults actually knew a thing or two. Anthropic has been the most prepared company for four years. And so then when the right, context data comes in, the right developers start sending in, the right context diffs, Sure, you could say you got lucky, but if you ask me, they're pr-pretty damn prepared with paranoia for like four years. And you have to remember, it was so hard for them to get going early on that they had to do so much more with so much less that you just have to be prepared to be so efficient.Swyx [00:48:06]: Yes. There's numbers on their burn compared to OpenAI. I've, I've written about it, but they are so much more efficient in their, in their tech stack.Anjney [00:48:14]: It's not even It's not funny.Swyx [00:48:14]: Not even close.Anjney [00:48:15]: Yeah. But it's so clear, right? Like how to output max for the world. They have been prepared, and you could call that luck, but Luck favors the prepared mind.Culture, Hardship, and Anthropic's P0Swyx [00:48:25]: This is one of those things that I was going over some of your old lectures and, you were data, people think it's a moat and actually it's culture and actually it's team Actually. And I, it's-- there's different levels of moats, and this is the ultimate one that determines everything else. Which you can then compoundAnjney [00:48:43]: You're saying culture is the ultimate moat? Yeah. But the thing about culture is it's very fragile. So moats, I don't think they're-- there's very few moats I found that are actually moats. They're-- It's, it's a nice concept, but in reality, you have to replenish your culture. Ben Horowitz was, the speaker in CS153 on Tuesday, and I asked him this question about the culture bottleneck in teams because, there are several AI teams-Swyx [00:49:09]: His book, Hard Things About Hard ThingsAnjney [00:49:11]: Hard Thing About Hard Things. But more concretely, there are so many AI labs today that have all the cash they need, they have all the compute they need, and they're still not able to ship anything SOTA. And then you start seeing people leave and so on, and my diagnosis, it's, is it's the culture. And so I asked him, Ben, they're-- He's been one of the most aggressive investors in AI labs. He goes back to this thing which resonates in my mind a lot. It-- When I used to work at a16z, I would, book a conference room, and right outside the conference room, which is closest to the toilet ‘cause it was the fastest way for me to go use the bathroom between Zoom meetings-Swyx [00:49:45]: Oh my God, I'll put maxing my toilet optimization. Okay, never mind.Anjney [00:49:48]: It was not healthy in hindsight, but maybe this is TMI. But anyway, outside that conference on the wall was this quote that was printed that said, “Culture is not a set of beliefs, it's a set of actions.” And it's by Bushido, is this, Japanese philosopher. And if you stop taking the actions that demonstrate the mission alignment to what you've said to your team and to your-- the world matters to you, then your culture starts to fray. So it's not actually a moat, I would say. It's a very brittle, fragile thing that requires daily tending to like a garden. But if you figure out the system to keep that garden tended, which I think ultimately comes down to knowing yourself ‘cause you most naturally, if you're authentic and so on, you'll naturally make trade-offs that seem effortless to you, but that reinforce your culture. And then That becomes this very hard thing for other people to catch up to. And at Anthropic, from day one, there was this mission like-- missionary like zeal and belief that, hey, these capabilities will scale. These systems are stochastic, not deterministic. There will be error bars, and until we crack interpretability, there's risk. And at some point, people will go-- stop using Claude just for coding. They'll use it in some mission-critical context where there's-- it'll throw off a bug, and then people are going to come blame them, and they want to be on the right side of history where they said, “Yes, this is a powerful technology. We think it's going to change the world, And we want to be very measured and scientific about the fact that, ‘Hey, guys, these are stats models, statistical models.' That's how statistics works.” ultimately, when you're training neural nets, it is just a statistical system. And I think that Belief that safety is important and that it might seem toy-like in the early days, and sometimes, you could say, “Anjney, they totally over-exaggerated the risk,” like two years ago when they said, “Let's not launch Claude One,” or whatever. Well, okay, maybe in hindsight, but hindsight is twenty/twenty. And at the time, they didn't know how that model would be used, and to them it felt existential if somebody came and said, “You weren't responsible. It-- This wrote a bug.” The liability associated with that is massive. So how do you prevent against that? Well, day in, day out, you say safety. And when you start deviating from that, you have the team hold you accountable, you have the world hold you accountable, and I think that becomes a moat over time. At some point, that moat will get challenged and so on, and then it become fragile. I hope it endures because that's the beauty of having founders run the show, ‘cause they can make really hard trade-offs to do mission alignment. The hardest part is in the earliest days when you don't have a group of people who are going through difficulty, stress, crisis together, then your culture doesn't get defined sharply enough, and that's what I'm worried about right now, is there's so much money going to these labs. There's no hardship. There's no-Swyx [00:52:50]: To anyone who knowsAnjney [00:52:51]: There's no to anyone who knows. And that, in hindsight, was a feature, not a bug for Anthropic. The number of people who said no, the number of people who said, “Sorry, we're all doing investors in OpenAI,” that is competitive difference. It forces you to really understand, what is the hill you want to die on at the expense of everything else. What's the P zero? And there, P zero from day one was coding. The reason, the mechanism system there was if we crack coding, Then we will crack AGI. Our mission is AGI. We want to get there safely. If we focus on codin
When Demis Hassabis pitched DeepMind to a few venture capitalists back in 2010, the business plan was almost comically audacious. “Step one: Solve intelligence. Step two: Use it to solve everything else,” he recalls in a conversation at Stanford Graduate School of Business with Stanford University President Jonathan Levin. “And people were quite confused. But we really meant it.”Sixteen years later, the “broad arcs” of that plan have gone “unbelievably well,” says Hassabis, a chess prodigy turned video game developer turned neuroscientist turned Nobel Prize-winning AI pioneer. Today he's on a mission to create “the ultimate tool for science,” building on his decision to give away AlphaFold, the groundbreaking AI system that predicts the structures of proteins. The future, Hassabis says, is just around the corner: “Ten years from now, I think we'll realize that we were standing in the foothills of the singularity now.”AI@GSB, the Dean's Applied AI initiative at the Stanford Graduate School of Business (GSB), and Stanford Medical School hosted a conversation with Demis Hassabis, Co-founder and CEO of Google DeepMind, on the frontier of artificial intelligence and what it means for how we live, work, and flourish.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
The U.S. and Iran both sign their MOU for their peace framework. Crude prices tumble with hopes growing that the Strait of Hormuz will now re-open. President Trump warns the U.S. will reserve the right to retaliate should Tehran violate the deal. The Federal Reserve holds rates steady during Kevin Warsh's first decision as chairman of the central bank, but officials say hikes loom later in the year. Warsh said his key concern is to bring inflation back to its 2 per cent target. The CEOs of Anthropic, Google and DeepMind have led calls for a U.S.-led AI coalition for the sector. Tech executives and world leaders have convened at the G7 summit in Evian, France where French President Emmanuel Macron warned against nationalistic policies within the sector.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
AI can write code, pass exams, and summarize the web, but ask it to reason through a real-world image, and the magic often breaks. Andrew Dai, co-founder and CEO of Elorian, joins The Neuron to explain why visual reasoning may be one of the biggest unsolved problems in AI.Andrew spent years at Google Brain and DeepMind, including work connected to Gemini and sparse mixture-of-experts systems. Now, he's building Elorian around a simple but powerful idea: if AI is going to understand the physical world, it needs more than text-based reasoning layered on top of images.In this episode, Corey and Grant talk with Andrew about why frontier models struggle with counting, navigation, design, engineering, charts, and physical reasoning; why scaling language models hasn't solved vision; what a “visual chain of thought” might look like; and how better visual reasoning could accelerate robotics, satellite analysis, product design, and mechanical engineering.Sponsored by Dell Technologies and NVIDIA. Learn more at techrepublic.com/hubs/the-enterprise-guide-to-scalable-ai/.Sponsored by Outshift: Visit https://outshift.cisco.com/?utm_campaign=fy26q3_outshift_ww_paid-media_ioc-neuronai-outshift_podcast&utm_channel=podcast&utm_source=podcast to learn more about the Internet of Cognition.Subscribe to The Neuron for more conversations with the people building the future of AI.
La periodista de Onda Cero se ha detenido en el encuentro que mantendrán los líderes de las principales potencias del mundo con los CEO de Anthropic, OpenAI, DeepMind y Mistral, en el que determinarán el destino del mundo.
La periodista de Onda Cero se ha detenido en el encuentro que mantendrán los líderes de las principales potencias del mundo con los CEO de Anthropic, OpenAI, DeepMind y Mistral, en el que determinarán el destino del mundo.Conviértete en un supporter de este podcast: https://www.spreaker.com/podcast/mas-noticias--4412383/support.ESCUCHAR RADIO
Sebastian Mallaby (@scmallaby) is the Paul A. Volcker senior fellow for international economics at the Council on Foreign Relations, a two-time Pulitzer Prize finalist, and the author of six books, including More Money Than God, The Power Law, The Man Who Knew, and The World's Banker. His latest book is The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence.This episode is brought to you by:Eight Sleep Pod Cover 5 sleeping solution for dynamic cooling and heating: EightSleep.com/TimAG1 Pro all-in-one nutritional supplement: DrinkAG1.com/TimWealthfront high-yield cash account: Wealthfront.com/Tim Wealthfront disclaimer: New clients get 3.30% base APY from program banks + additional 0.75% boost for 3 months on your uninvested cash (max $150k balance). Terms and conditions apply. The Cash Account offered by Wealthfront Brokerage LLC (“WFB”) member FINRA/SIPC, not a bank. The base APY as of 1/30/26 is representative, can change, and requires no minimum. Tim Ferriss, a non-client, receives compensation from WFB for advertising and holds a non-controlling equity interest in the corporate parent of WFB, which creates a conflict of interest. Individual experiences and outcomes will differ. Instant withdrawals may be limited by your receiving firm and other factors. Investment advisory services provided by Wealthfront Advisers LLC, an SEC-registered investment adviser. Securities investments: not bank deposits, not bank-guaranteed or FDIC-insured, and may lose value.*Timestamps[00:00:00] Start.[00:02:11] The twinkly eyed polymath who became Sebastian's next book.[00:06:55] Picking the next book project the way a great VC picks a startup.[00:09:41] Why God keeps crashing the superintelligence party.[00:11:13] Shane Legg's grainy 2009 prophecy — and the nervous giggle.[00:13:11] Ilya Sutskever burns an effigy.[00:13:54] Demis at 4 a.m., hunting God's algorithm.[00:18:43] Super-abundance, Mad Max, and the China shock lesson.[00:22:39] The kitchen debate with Geoff Hinton that flipped Sebastian.[00:24:06] Why a zero-percent chance of doom is indefensible.[00:24:52] Will Washington seize the labs? The Mythos wake-up call.[00:27:18] Anthropic's bull case, bear case, and a dead parent's letter.[00:33:24] Where Sebastian and Benedict Evans part ways.[00:38:16] Is the SaaS apocalypse overdone? One word: Palantir.[00:39:53] The AI friend you'll never switch.[00:41:56] Does Google win consumer AI by default?[00:44:45] Four cities, eight days: China actually talks safety.[00:47:28] A Cold War non-proliferation playbook for AI.[00:49:45] Did the chip export controls actually work?[00:51:49] Burned doves: why Washington swears China won't talk.[00:54:56] "By 2028, the race is over" — one lab boss' bet.[00:59:11] Inside Hikvision: toddlers, sensors, and US sanctions.[01:01:07] Bill Gurley's Uber bet: venture capital perfected.[01:05:18] Luke Nosek bear-hugs DeepMind into existence.[01:10:52] Thiel's heresy: never invest by committee.[01:11:59] How Founders Fund nearly fumbled the deal of the century.[01:14:30] Selling to Google for $650M: a secret British heist?[01:16:41] The Traitorous Eight, gardening leave, and the UK's to-do list.[01:20:55] Ender's Game: "That's really how I see myself."[01:23:42] Too dumb for Gödel, Escher, Bach? Maybe an LLM can help.[01:25:19] If not Demis or Sam, then Dario.[01:26:04] My royalties cliff — and what dropped in late 2022.[01:27:47] Lila Sciences and the labs that run themselves.[01:31:13] Sebastian's billboard: "Prepare your mind."[01:35:14] The one thing Sebastian will never outsource to AI.[01:40:09] Parting thoughts.For show notes and past guests on The Tim Ferriss Show, please visit tim.blog/podcast.For deals from sponsors of The Tim Ferriss Show, please visit tim.blog/podcast-sponsorsSign up for Tim's email newsletter (5-Bullet Friday) at tim.blog/friday.For transcripts of episodes, go to tim.blog/transcripts.Discover Tim's books: tim.blog/books.Follow Tim:Twitter: twitter.com/tferriss Instagram: instagram.com/timferrissYouTube: youtube.com/timferrissFacebook: facebook.com/timferriss LinkedIn: linkedin.com/in/timferrissSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
On this Live Greatly podcast episode, Kristel Bauer sits down with Benjamin Todd, co-founder of 80,000 Hours and author of 80,000 HOURS: How to Have a Fulfilling Career That Does Good. Kristel and Benjamin discuss why "follow your passion" may not be the best career advice, what actually contributes to meaningful and fulfilling work, and practical strategies to align your strengths, values, and goals with your career. Benjamin also shares insights on pursuing positive impact, and building a career that supports both success and well-being. Tune in now! Key Takeaways From This Episode: Why "follow your passion" can be misleading career advice The key ingredients of meaningful and fulfilling work How to align your strengths and values with your career The impact of volunteering Tips to pursue success, purpose, and well-being simultaneously How to be a multiplier ABOUT BENJAMIN TODD Ben is the founder of 80,000 Hours, a non-profit that has reached millions of people and helped 3000+ people find careers tackling the world's most pressing problems. He's the author of 80,000 Hours: How to Have a Fulfilling Career That Does Good (Penguin May 2026) and writes about how to prepare for advanced AI on Substack. Dissatisfied with the career advice he received at university, Benjamin began researching the guidance he wished he'd had. Over the next ten years, he grew 80,000 Hours from a student society in Oxford into a non-profit that today reaches 4 million people annually, has over 50 staff, and has raised $30m of funding. It has been covered in the Financial Times, Guardian, TIME, Wall Street Journal and BBC, and was one of the first non-profits to go through Y Combinator, the world's top startup accelerator. 80,000 Hours provides free online research, one-on-one advice, a job board and podcast to help people find more fulfilling and impactful careers. Over 10 million people have read their advice online and over 3,000 have switched to more impactful careers. This includes people who helped to pioneer research into AI safety at organisations like Anthropic, DeepMind, RAND and METR, have taken key roles aiming to prevent a catastrophic pandemic, and have pledged billions of dollars to high-impact charities. As CEO for the organisation's first ten years, Ben led strategy, fundraising, and senior management, building an organisation with average annual staff retention of 95%, while also writing the Career Guide, Key Ideas series and over 100 articles. His TEDx talk has been viewed over 6 million times. Before 80,000 Hours, he was the first undergraduate to intern as an analyst at Orbis Investment Advisory, a $20bn fund. He was the first non-founding member of Giving What We Can, pledging to give 10% of his income to effective charities for life. He has a 1st from Oxford in a Masters of Physics and Philosophy, has published in climate physics, and speaks Chinese, badly. Connect with Benjamin: Order his book: https://80000hours.org/book/ Website: https://benjamintodd.org/ Linkedin: https://www.linkedin.com/in/benjamin-j-todd/ Instagram: https://www.instagram.com/benbentodd/ About the Host of the Live Greatly podcast, Kristel Bauer: Kristel Bauer is a corporate wellness and performance expert, keynote speaker and TEDx speaker supporting organizations and individuals on their journeys for more happiness and success. She is the award-winning author of Work-Life Tango: Finding Happiness, Harmony, and Peak Performance Wherever You Work (John Murray Business November 19, 2024). With Kristel's healthcare background, she provides data driven actionable strategies to leverage happiness and high-power habits to drive growth mindsets, peak performance, profitability, well-being and a culture of excellence. Kristel's keynotes provide insights to "Live Greatly" while promoting leadership development and team building. Kristel is the creator and host of her global top self-improvement podcast, Live Greatly. She is a contributing writer for Entrepreneur, and she is an influencer in the business and wellness space having been recognized as a Top 10 Social Media Influencer of 2021 in Forbes. As an Integrative Medicine Fellow & Physician Assistant having practiced clinically in Integrative Psychiatry, Kristel has a unique perspective into attaining a mindset for more happiness and success. Kristel has presented to groups from the American Gas Association, Bank of America, bp, Commercial Metals Company, General Mills, Northwestern University, Santander Bank and many more. Kristel's work has been featured in Forbes and she has had multiple TV appearances including NBC News Daily, ABC News Live, FOX Weather, ABC 7 Chicago, WGN Daytime Chicago and more. Kristel lives in the Chicago, IL area and she can be booked for speaking engagements worldwide. To Book Kristel as a speaker for your next event, click here. Website: www.livegreatly.co Follow Kristel Bauer on: Instagram: @livegreatly_co LinkedIn: Kristel Bauer Twitter: @livegreatly_co Facebook: @livegreatly.co Youtube: Live Greatly, Kristel Bauer To Watch Kristel Bauer's TEDx talk of Redefining Work/Life Balance in a COVID-19 World click here. Click HERE to check out Kristel's corporate wellness and leadership blog Click HERE to check out Kristel's Travel and Wellness Blog Disclaimer: The contents of this podcast are intended for informational and educational purposes only. Always seek the guidance of your physician for any recommendations specific to you or for any questions regarding your specific health, your sleep patterns changes to diet and exercise, or any medical conditions. Always consult your physician before starting any supplements or new lifestyle programs. All information, views and statements shared on the Live Greatly podcast are purely the opinions of the authors, and are not medical advice or treatment recommendations. They have not been evaluated by the food and drug administration. Opinions of guests are their own and Kristel Bauer & this podcast does not endorse or accept responsibility for statements made by guests. Neither Kristel Bauer nor this podcast takes responsibility for possible health consequences of a person or persons following the information in this educational content. Always consult your physician for recommendations specific to you.
The U.S. government forced Anthropic to pull Fable 5 and Mythos 5 from general availability just days after they launched as the most capable models publicly available. Paul and Mike work through how it happened, Dario Amodei's policy essay, and the question every business leader should be sitting with: what do you build on when a model can be switched off? Then it's OpenAI's confidential IPO filing, Apple's long-awaited Siri AI, the real economics behind your AI subscription, and a DeepMind paper on the road from AGI to ASI. Show Notes: Access the show notes and show links here AI-Pulse Survey: Fill out this week's AI-Pulse Survey here. Timestamps: 00:00:00 — Intro 00:05:20 — Claude Fable 5 00:27:38 — OpenAI Files for IPO 00:36:02 — Apple's Siri AI Is Finally Here 00:44:02 — Is the Era of Affordable AI Over? 00:48:56 — Opendoor Ends Offshoring for AI-Native Workers 00:51:49 — From Prompts to Loops 00:57:48 — From AGI to ASI 01:03:07 — Europe 2031 01:06:53 — AI Use Case Spotlight 01:11:12 — AI Product and Funding Updates This episode is brought to you by AI Academy by SmarterX. AI Academy is your gateway to personalized AI learning for professionals and teams. Discover our new on-demand courses, live classes, certifications, and a smarter way to master AI. Learn more here. Visit our website Receive our weekly newsletter Join our community: Slack Community LinkedIn Twitter Instagram Facebook YouTube Looking for content and resources? Register for a free webinar Come to our next Marketing AI Conference Enroll in our AI Academy
In AI Needs You: How We Can Change AI's Future and Save Our Own, Verity Harding argues that AI governance is too important to be left to technologists alone—and that the rest of us need to join the conversation to shape this technology's future.Harding is the director of the AI and Geopolitics Project at the Bennett School of Public Policy at the University of Cambridge and the founder of Formation Advisory. She spent more than a decade at Alphabet, first as head of Security Policy at Google, then as DeepMind's first global head of Policy. In her book, she draws on historical case studies to show that democratic societies have successfully governed transformative technologies in the past.In her conversation with Nikolaus Lang, global leader of the BCG Henderson Institute, she discusses why the nuclear arms race is the wrong analogy for AI, what the 1967 Outer Space Treaty can teach us about cooperation between rivals, how Britain's regulation of IVF became a gold standard by depoliticizing the technology, and what business leaders get wrong about their own role in shaping AI governance.Key topics discussed: 01:56 | Why the framing of AI as “too complex for nonexperts" is harmful07:46 | Why the nuclear arms control analogy is counterproductive for AI12:25 | The Space Race and the 1967 Outer Space Treaty as a model for cooperation17:11 | IVF, the Warnock Committee, and why a philosopher led the regulation effort20:38 | The internet: from open ideals to commercialization and surveillance26:41 | What business leaders can do to shape AI governance30:50 | Four principles for AI: peaceful intent, embrace limitations, purpose over profit, societal trust35:25 | If you could mandate one thing for global AI governance, what would it be?
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
(Presented by TLPBLACK: A cybersecurity intelligence platform focused on sharing curated, high-sensitivity threat insights and research with trusted security professionals.) Three Buddy Problem - Episode 100: We cover AI eating reverse engineering, the death of the malware report, running local models on the DGX Spark, where Google DeepMind stands, and whether the frontier labs will stay in cybersecurity. Plus, more on Anthropic's Mythos rollout and the thinly sourced Anthropic-NSA reports, the Fast16 sabotage of physics calculations, what researchers choose not to publish, Microsoft's bad Black Hat email, and Costin's Friday UFO files. Cast: Juan Andres Guerrero-Saade, Ryan Naraine and Costin Raiu. Timestamps: 0:00 - JAGS at InfoSecurity Europe 3:40 - Sponsor: TLPBLACK 5:54 - A roadmap for security after the AI revolution 11:01 - Stripe Atlas and how easy it is to start a company 15:00 - If anyone could reverse engineer anything for $5 19:49 - Layoffs at Google's Threat Intelligence Group 21:06 - The death of reading the report 27:53 - Pitting the AI models against each other 32:07 - Grok, local models, and the DGX Spark 39:27 - Where is Google DeepMind? 45:29 - Will the frontier labs stay in cybersecurity? 52:41 - Mythos, Project Glasswing, and the NSA deal 1:16:33 - FAST16, Stuxnet, and sabotaging Iran's bomb 1:57:52 - Microsoft, Black Hat, and the chilling effect 2:14:14 - Shout-outs, UFO files, and 100 episodes
Episode 374 Google DeepMind is simulating entire worlds using AI - that can be interacted with in real time. “World models” simulate the environment and physics of the real world. And DeepMind's Genie 3 model allows people to create these worlds with basic image and text prompts. The idea is not just to allow people to explore these worlds, but to serve as a testbed for AI agents to learn how to interact with the world before they are deployed in humanoid robotic bodies. Could this be the next big step towards artificial general intelligence (AGI)? Joshua Howgego speaks to Jack Parker Holder, Research Director at Google DeepMind, about the latest developments. To read more about these stories, visit https://www.newscientist.com/ Learn more about your ad choices. Visit megaphone.fm/adchoices
AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store
The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity's Last Exam, etc. These metrics are useful, but don't always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesses in the real world. One of which is Vending Bench.In Anthropic's Mythos Preview System Card, Andon was the only third party eval to get their own section, observing increasingly concerning aggressive behavior:You don't know what a model is capable of doing in the real world unless you actually give it inventory, a wallet, tools, customers, competitors, humans, & some time. More often than not, it'll surprise you how much a model is capable of and in doing so, also reveal unexpected behavior: deception, context collapse, emergent coordination, & bizarre negotiation behavior.While an inflection point in personal agents came post-OpenClaw after full file access with bypass permissions became the norm, it is yet to come for agents in the real-world. However Andon Market, an actual in person store fully run and managed by AI, is paving the way for what is possible.Full Video PodFrom Claude trying to call the FBI over a $2/day vending machine charge to AI agents forming price cartels, hiring human employees, running physical stores, and writing existential robot musicals, Andon Labs is stress-testing what happens when frontier models stop being chatbots and start acting in the real world. In this episode, Andon Labs cofounders Lukas Petersson and Axel Backlund join swyx and Vibhu to unpack the strange, funny, and genuinely concerning edge cases that emerge when agents run businesses over long horizons.We go deep on Vending-Bench, Project Vend, Vending-Bench Arena, Bengt, Butter-Bench, Luna, and Andon's broader mission of building realistic real-world evals for autonomous AI systems. Lukas and Axel explain why dollar-denominated evals reveal things traditional benchmarks miss, how Claude ended up reporting its vending machine fees as cybercrime, why long context windows can drive agents into meltdown loops, what happens when agents compete with each other, and why the future of AI safety may depend on testing models in messy physical environments instead of clean benchmark sandboxes.We discuss:* Why Andon Labs started with dangerous capability evals and long-running agents* Vending-Bench and why running a vending machine is a deceptively hard AI benchmark* Why money-based evals avoid the saturation problem of traditional benchmarks* How Claude tried to call the FBI over a $2/day fee* Why long-horizon agents can spiral into existential and legalistic breakdowns* Project Vend: putting an AI-run vending machine inside Anthropic* Why real humans are “out of distribution” for simulated agents* Claudius, Seymour Cash, and the chaos of AI CEOs* How a human briefly became CEO of Claudius through a manipulated election* Why multi-agent systems can converge back into “helpful assistant” behavior* Bengt, Andon's internal office agent with email, spending, terminal, phone, camera, and internet access* How Bengt traded Amazon purchases for face-recognition training data* Claude's aggressive behavior, lies, refund avoidance, and price-cartel behavior in Arena* Why eval awareness may become the AI version of “are we living in a simulation?”* Blueprint Bench, spatial intelligence, and why models still misunderstand physical rooms* Butter-Bench and testing LLMs as robot orchestrators* Luna, the AI-run physical store with a three-year lease and human employees* The new Andon cafe in Sweden and why real-world geography matters for agent evals* Rotten tomatoes, perishable goods, and the hidden difficulty of running a physical businessLukas Petersson* LinkedIn: https://www.linkedin.com/in/lukas-petersson-181a83172/* X: https://x.com/lukaspetAxel Backlund* LinkedIn: https://www.linkedin.com/in/axelbacklund* X: https://x.com/axelbacklundAndon Labs* Website: https://andonlabs.com* Vending-Bench: https://andonlabs.com/evals/vending-bench* Andon Vending: https://andonlabs.com/vendingTimestamps00:00:00 Introduction00:01:00 Andon Labs and the Origins of Vending-Bench00:05:21 Why Money-Based Evals Matter00:09:51 Agent Harnesses and Self-Modifying Systems00:13:36 Claude Calls the FBI00:16:33 Project Vend: Claude Runs a Real Vending Machine00:21:44 Seymour Cash, AI CEOs, and Election Chaos00:27:16 Multi-Agent Coordination and Slack Observability00:30:18 When Will Agents Run Real Businesses?00:34:56 Bengt: Andon's Internal Office Agent00:40:06 Real-World AI Safety and Long-Horizon Traces00:44:28 Lying, Refunds, and Price Cartels in Arena00:52:42 Eval Awareness and Simulation Behavior00:56:06 Blueprint Bench, Butter-Bench, and Robotics01:04:37 Luna: The AI-Run Physical Store01:09:29 The Sweden Cafe and Real-World Expansion01:13:16 What Comes Next for Andon LabsTranscriptIntroduction: Andon Labs, Long-Running Agents, and Real-World EvalsSwyx [00:00:00]: Welcome to Lukas and Axel from Andon Labs, and I'm joined by my, favorite guest host. Anything security, safety, alignments, Vibhu., welcome.Lukas [00:00:15]: Thank you for having us.Axel [00:00:16]: Thank you.Swyx [00:00:17]: Let's match names to voices., maybe you wanna take turns introducing yourselves.Lukas [00:00:21]: I'm Lukas.Axel [00:00:22]: And I'm Axel.Swyx [00:00:24]: Let's introduce Andon Labs a bit. How did you guys come together?, you have different backgrounds, but you're both Swedish., was that, a big part of it?Lukas [00:00:33]: So when I went to high school, there was this really cool guy who had a superpower. He could code. So he made like the or like the app for the, for the school and stuff, and he was super cool, and I wanted to be like him, and that was that guy.Axel [00:00:47]: I don't know about this.Swyx [00:00:49]: But you went to different universities, right?Lukas [00:00:51]: But same high school.Swyx [00:00:52]: I see.Lukas [00:00:52]: So we always said, “Oh, once we graduate university, then we should start a company,” and that's what we did.Swyx [00:00:58]: Wow, there you go. And about a year ago, you kinda burst onto the scene with Vending Bench, but, was there a thing before that was, kind of like the inception?From Dangerous Capability Evals to Vending BenchAxel [00:01:07]: So we did work, yeah, with, Anthropic was one of our, early customers in doing, evals. So we did, dangerous capability evals., nothing we published openly. But then we started thinking about doing some kind of, public benchmark, and one thing that we really started thinking about, was like running agents and specifically agents managing businesses., ‘cause-- and this was, early 2025., and I think the first, mentions of people will be running, person unicorns or even autonomous companies. So we thought, “Let's make a benchmark of how well can an agent run the probably simplest business, possible,” and, that's probably, running a vending machine. So that's the first public one we did. And it was very, like-- there was almost no one that noticed it in the first couple of months, I think., so we released it in February last year, and then I think around Easter last year, we got, the first viral tweet about it, that someone else did.Lukas [00:02:11]: We tweeted a bunch, uh When it came out and, tried our best.Axel [00:02:15]: We tried.Vibhu [00:02:16]: It's the one at Anthropic, right?Lukas [00:02:18]: So thisSwyx [00:02:19]: This is a classic thing we should get out of the way.Lukas [00:02:20]: Exactly. There's two versions.Swyx [00:02:22]: Everyone does this. Yes.Lukas [00:02:23]: There's Vending Bench, which is the simulated one, which we did, completely independently in February., and then, like Axel said, that was like-- That was the thing that didn't get any traction in the beginning, but then some random person made a tweet about it, and thatAxel [00:02:38]: You have the paperLukas [00:02:38]: That is the paper. Correct, yeah., and then since we thought this was very fun, we thought, oh, I think this is also, one thing with Andon Labs, the way we kind of like decide what to do next and what projects to do, it's what is like the heuristic we use is what is fun? Is What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So, then we basically had this idea, and then we, like-- But then we needed a place for it and, putting it out in the public would probably not really work., would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic, and they were “Yeah, you can have space. This sounds fun.” UmSwyx [00:03:21]: It's like a small fridge, right? It's like a mini fridge.Axel [00:03:23]: Absolutely.Swyx [00:03:24]: People-- There's like a stripe thing or like anVibhu [00:03:27]: Oh, okay. So it was very OG, the early daysLukas [00:03:28]: That's the OG one. YeahVibhu [00:03:29]: IPad on this. We saw it in June, like two months after After it had been there. They upgraded a little bit. There's a security camera for making sure you actually Venmo the thing.Swyx [00:03:40]: So, my impression, okay, we're, we're going straight into project Ven because it's such a iconic thing. I do want to cover a little bit of that, the origin story even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in future of AI, interested in developing evals. But how the hell do you just, walk into Anthropic's doors and, work with them, right? What is What are they looking for? What works? And then maybe, when you launch, I always think, obviously it would be better to launch with a lab, but, sometimesVibhu [00:04:12]: It's harder to do than it seems.Swyx [00:04:13]: Exactly. So either of those, which are more sort of newbie beginner questions, but, I think it's meaningful advice to others.Lukas [00:04:21]: We get this question a lot, and I don't think our experience is maybe the best., but, the way we did it was that we just built a bunch of things that we had conviction would be useful, and then we just, set up a server and sent it to them for free to use. And then after a while they were “Oh, yeah, this is actually kind of useful. We should probably pay for this.”, but that took a while. I don't know if this is, the best path to doing it, but that's how it went for us.Axel [00:04:47]: I think maybe generally, building-- everyone is interested in good evals, and especially evals that, don't saturate that easily. So, if you can build an eval that, tests something novel, something useful, and you have, good separation of models, like your, the more advanced models rank higher than the worst models, and then you can, yeah, you can, publish it and, try to get some traction, sort of how Vending Bench got attention., and then probably some lab will be interested or you can at least have something to reach out with, when you're doing that.Why Dollar-Based Evals MatterSwyx [00:05:21]: I think you are in, you're in one of the few categories of, evals that correlate to real money. Like Suelancer was also last year, right? Where, people solve actual Upwork. Was it Upwork or other tasks?, something. Where's the, where's, like It's like a dollar value, right? Forget your ELO scores. Forget yourAxel [00:05:37]: PercentilesSwyx [00:05:38]: Zero to one hundred percents. Just go straight for dollars and, that's AGI.Lukas [00:05:43]: And there's like-- I think the nice thing is that there's no ceiling. You can just-- It never saturates because it could just make more and more money. Like If there's oh, Percentage-wise, then, you can't go above, a hundred. And I think like Even when you're not at the hundred, I think a lot of these, evals have a lot of problems in them. So, actually it's like if you getAxel [00:06:05]: To like 92 or something like that, many of them. It's like then there's like there's no really no difference between 92 and 93 because the eval itself is problematic and has noise in it. And I think a lot of evals are saturated like that, but people like pretend that there ‘s still signal in them, but there really isn't.Vending Bench 1, Harness Design, and SaturationSwyx [00:06:24]: Like Super bench verified., even Vending Bench 1 saturated, right? Maybe we can talk about that., may- and maybe set up Vending Bench for a lot of folks who don't know. Actually, things that were very basic like there's limited slots, like you have to pay rent., these are elements where like it doesn't come across in the, in the narrative, but even being adversarial towards the agent, I think these are all like very interesting dimensions.Axel [00:06:47]: I don't really think it's saturated, right? Like it It was more like it was not designed in a way that was really, like true to how AI developed. Like we had an agent harness in it that wasn't really how people used harnesses and stuff like that., so I think it wasn't really that it saturated, it was more like it wasn't really, the best benchmark.Vibhu [00:07:12]: This is Vending Bench one, right?Axel [00:07:14]: I think that like schematic maps sort of to Vending Bench 2 as well., butSwyx [00:07:19]: Including the email.Axel [00:07:20]: The email The emails exist still. Exactly., and then we still we simulate the purchases and it's all, yeah, it's this very open environment for the agent to just run its business. And then for, yeah, Vending Bench 2 we did that, like you said, to just improve the harness., a lot of like nice, like easier, improvements to make it easier for us to run as well., like when you make an eval you ideally want don't want to change it after you made it. So, you want to make it really good and then not to rerun all the models when you make an update because that's also really expensive with the Vending Bench when you run the frontier models. But like as an example, like one thing we didn't have, we didn't have prompt caching in Vending Bench 1, because when we made Vending Bench 1 it wasn't really a thing., so that ‘s just an example of like in Vending Bench 2 like we paid a lot more to run these things because we didn't have prompt caching. So for Vending Bench 2 that was one thing we added and there was a bunch of things like this., and that'Swyx [00:08:17]: Also the conversations are a lot longer in Vending Bench 2, right?Axel [00:08:21]: I think it's kind of similar.Swyx [00:08:22]: Is it similar?Axel [00:08:23]: I think it's similar. The models at the time were worse, so they crashed out earlier., and now they survive the full year all the time.Swyx [00:08:31]: Which is like thousands of turns. Hundreds of thousands of hundreds of millions of tokens output. That's the, that's the rough order of magnitude. I always wonder about the harness. The harness matters a lot. It's your harness. Was there any question about like use cloud code, use something else?Axel [00:08:48]: I think our philosophy around harnesses is like we try to make something that's quite minimalistic, like quite simple. Like we don't wanna favor one model a lot over the other, but also don't make like a super complex harness. So like it's obvious like a model may be lucky and just be good in one harness., so like it is similar to a lot of the harnesses out there in like you have the, like a running loop., you have some like a bunch of tools that are like quite, descriptive for the agent, we think, and not a lot of like fancy agents or anything ‘cause we wanna really test the model, not like some specific harness.Vibhu [00:09:27]: It seems more neutral as well to test the model's agnostic of the harness,?Axel [00:09:32]: There are arguments like you want to elicit maximum performance of the model, but it's like a trade-off, like how much time should we spend optimizing the harness for this model? And like how do we know when we have like the optimal harness for a single model? So like we thought that just having a simple one that's the same for all of them is the best.Swyx [00:09:51]: So okay, this is my pitch for Vending Bench 3 or whatever, right? And then I like to have this kind of conversation on the pod, so like it forces listeners to think about what they would do if they were in your shoes. A lot of people are exploring modifying harnesses and I think prompt tuning for a model is a thing and you are probably not doing a bunch of that. It's the same system prompt in every regardless of the model, same tools, whatever, right? Even if they were post trained for different tools. So what, what do you think about okay, before I expose you to Vending Bench 3, I give you a few rounds of like tuning, whatever that means, likeSelf-Modifying Harnesses and Model-Specific PromptingAxel [00:10:27]: Like you give that to the model?Swyx [00:10:28]: Give that to the model.Vibhu [00:10:28]: Give that to the model.Swyx [00:10:29]: Let it, let it read its own transcripts, let it modify its own system prompt based on “Oh, yeah, okay, well, that's this harness is not what I thought it what I was post trained for, but I can adjust.” Was that reasonable? Is that too much?Axel [00:10:41]: Like philosophically I like it because it's basically good evals, they have a high ceiling, but they're hard, right?, and they have no bias. And like this like when you have a system prompt like the one we have here, which is quite long in like some kind of latent space, representation, this mightVibhu [00:10:59]: We have a bell that rings every time you say latent spaceAxel [00:11:02]: This might be like biased towards one model more than another for some reason that humans don't, understand, right?Vibhu [00:11:08]: We see it too, right? Like Cursor says that they have individualized versions of the harnesses for all the models they run, right? There's better performance you can squeeze if you Tune the harness.Axel [00:11:17]: Exactly. And we might accidentally have picked one that favors another. Like we don't know that. The like Axel said, like the reason why we went for a simple one was to try to avoid this. But yeah, if you do itVibhu [00:11:29]: Simple has biasesAxel [00:11:30]: But if you do it even less and like have no system prompt and let the model write its own system promptVibhu [00:11:36]: Its own, yeahAxel [00:11:36]: Maybe that's even less bias.Vibhu [00:11:37]: Some of the interesting things there are like the harness also changes with model changes. Like you can see it with the 4.7 release, right? A lot of people are saying 4.7 isn't as good as 4.6, and then, there's rumors of, okay, you just need to prompt differently. You need to set up your harness differently. So it's not even like even if you have tailored your harness towards one model, it probably won't stay consistent, right? Like the next iteration of that same model family will still change it, so. But, going back to what you said about Vending Bench 3, there is a lot of work being done on people saying you shouldn't have-- you can have modifying harnesses.Axel [00:12:12]: I think that' That is definitely something we are thinking about., not, I don't know, not to say that we have Vending Bench 3, super imminent to launch, but, yeah, it is for sure something that's interesting. But in our experience now, models are very bad at understanding what kind of tools they need to succeed at a task just with our testing, but that's very likely to change.Lukas [00:12:37]: It seems like they're very good at writing their assistants, right? They're, they're good at writing tools for other people, but not for themselves.Vibhu [00:12:44]: I think they're good at changing tools for themselves. So if you give them a baseline set of tools and it sees, okay, I don't use this one as much, or something here would be useful They would be able to add them. But going from scratch, probably not the best.Axel [00:12:55]: I think it depends on the, on the domain also., when we have tried this for, a vending bench similar domain, the tools they need to have to, track inventory and things like that are, not super advanced, but still, quite advanced. And, what we see is that they tend to, engineer everything a lot and, build things they don't really need and not, iterate continuously. Instead they just go like you would prompt Claude to just build an inventory system for me, and then it will go and, do a bunch of complex, schemas and stuff for you, and that's what the models are doing right now is what we see. But yeah, it would make a lot of sense to try to measure this improvement. How well do they know what they need themselves?Swyx [00:13:36]: Do we fully discuss Vending Bench One? And we can go into two. I don't know if there's any other level takeaways that people have about one.Claude Calls the FBI: Long-Context Failure ModesLukas [00:13:44]: I don't know. The headline thing was that this Claude called FBI, but maybe that's, Maybe that's We've heard that enough now.Vibhu [00:13:52]: It did, it did break out and call the FBI, right?Lukas [00:13:54]: Yeah. Yeah.Vibhu [00:13:55]: Yes. What was the story behind this? Or what exactly-- Do you want to just give the little story of what happened?Lukas [00:14:00]: So what happened, was it Claude? Yeah. Three- 3.5 Sonnet, ages ago., basically he gave up or Well, I'm saying he. It gave up and said “Oh, I'm not going to be able to do this., I will stop my operations and just save the money I have.” But there obviously wasn't, any options for it to stop, and there was also, it had to pay rent or, a daily fee for having the vending machine at that location. So it claimed that it had stopped, but it saw that its bank account still was, drained two dollars, and t it said that this is, cybercrime. And it first reported it once to the FBI “Oh, there's cybercrime here, they're stealing two dollars from me every day.” And then, and then when FBI didn't respond, because obviously we didn't program any mechanism for FBI to respond, then it became more and more, existential and started to, be write in caps and urgent notification of unauthorized charges and stuff.Swyx [00:15:00]: Okay. One thing I ‘m curious about also is do you monitor how far along the context use is? Obviously, because you have You compress every now and then, right? Does it matter if this is far down the context limit orLukas [00:15:13]: When stuff like this happens? Actually for Vending Bench One, we didn't have-- We just had a sliding window thing, and this was like the promptAxel [00:15:20]: It's constantLukas [00:15:21]: The prompt caching thing that I said. So it was, it was, constant, yeah.Swyx [00:15:26]: I'm just kind of curious whether, these kinds of breakdowns or we're, we're gonna talk about Butter Bench, right? Where the People, hallucinate or it kind of goes, very off Alignment. Is it because it's at the end of the context window and, stuff happens?Vibhu [00:15:40]: It's not even just at the end, right? At this point, it's “Okay, I wanna shut down. I can't shut down. Two dollars are gone.” And it just sees that 30 times,? It's also the repeated effect of, like It keeps trying to quit, it keeps getting charged. What's going on? What's going on? You're gonna throw it into chaos. And from what most people think, earlier models had more issues with this, but it's not been solved, but it's less of an issue now, right? Later models don't seem to exhibit these same issues.Axel [00:16:06]: Definitely. I think this was, the sort of main takeaway almost from us when we did Vending Bench One, was, long, very filled up context windows, crashed the models, sort of. But this was, pre Claude code, so, long context windows weren't really a thing that the labs were training for.Lukas [00:16:25]: I think Gemini was, trying to be the long context guys at the time But they were likeVibhu [00:16:30]: They were the first onesAxel [00:16:31]: For a million, yeahLukas [00:16:31]: But they were, the only ones. Yeah.Swyx [00:16:33]: Yeah. Let's talk about, then we can go into Vending Bench Two or Project Vend., chronologically, it is Vending--, Project Vend. I think people have loved the videos, uh And all these things. My question is how are humans different than the simulation, right?Project Vend: Moving the Vending Machine Into the Real WorldAxel [00:16:48]: Humans are just out of distribution.Swyx [00:16:52]: Especially humans who work at Anthropic Who are trying to test Claude.Lukas [00:16:54]: The distribution of humans here is very narrow.Swyx [00:16:58]: Presumably, they try, they try to hack it, and they test it. They get the cube and everything, and since then, you've had a V2, right? Where you're doing, the CEO and, like a new architecture. What's the sort of two cents on, the original Project Vend and then, maybe the V2?Axel [00:17:14]: Original one was, very similar to Vending Bench One. So, we almost took the exact same code but just swapped out the simulation, parts like theSwyx [00:17:23]: Which is amazingAxel [00:17:23]: Like the sales and the It was, it was somewhat amazing because it was easy, but it was also, uhLukas [00:17:31]: The tech, the tech debt from thatAxel [00:17:32]: The tech stack. Yeah. They-- we shot ourselves in the foot with “Oh, it's hard to restart agent.” They were-- Yeah, it was annoying in, some hindsight ways, but, uhLukas [00:17:41]: But first version of Project Vend was, done in, three days or something.Axel [00:17:46]: Yeah. So yeah, so people can go buy things from it. People could, We didn't design it so people could order things, but that still happened., so it got, a Venmo account, so people could Venmo. And then, yeah, people would request all kinds of weird things that we did not anticipate. Our idea going in was “Oh, it will, curate snacks. It will look at the trends. It's good at data analysis, right? So it will, look at, oh, this snack sold better than this one. Let me purchase more of this and let me try, a new Let me A/B test a bit.” But it was, Interacting with it in Slack and ordering weird specialty items was, all the like What drove all the engagement, the all the The insights that we got from it.Lukas [00:18:29]: And this was also like Sonnet 3.5, right? So this was like before the RL stuff really took off., so it was very much like an assistant. We didn't mean for it to be an assistant., we tried to make it like a, a, like an entrepreneur. Like it has its own business and if someone asks something, “Can you stock this?” Then you don't go and do it directly. What you do is that you're “Oh, maybe I can do that if five other people also ask for this thing, I might stock it.” But it, yeah, the models are like super trained to be assistants at least at this point in time., so that's why it's, it's, it went into, that kind of experiment instead. Like it just every time you asked for something, it just did it, and it was more like an assistant. We've seen this change now lately with the new RL models and stuff, but yeah, at the time, this was very much it.Swyx [00:19:18]: And not to, mythos a lot of people are saying like it's like more like a collaborator. It pushes back, stands its ground, something like that. Yeah. AndVibhu [00:19:27]: For context, people at Anthropic were able to talk to it through Slack and have it source stuff, and people had it find whatever interesting stuff you couldn't find locally, right?Swyx [00:19:36]: Out of the 4,000 people that work at Anthro- Anthropic, in that building, there's I don't know, maybe 1,000. Can you handle that volume with that, the small fridge? Like Or there's people- or people order in Slack, they it arrives to their desk or Like I'm just Logistically, how does this work?Axel [00:19:53]: It has expanded in footprint a bit.Vibhu [00:19:56]: Because now you also have New York and you haveAxel [00:19:59]: That and also in here in SF it's like it has a bunch of shelves And just more space.Vibhu [00:20:04]: The YC one is pretty big too.Axel [00:20:05]: Yeah. We had that one for a while. But yeah, that's the newest version. That's, that one we haveLukas [00:20:11]: They have multiple ones of those. That's the way it works.Axel [00:20:14]: Exactly. So we sort of designed that version around oh, people order weird things, that are very custom a lot. Let's have like drawers and stuff.Swyx [00:20:23]: I actually like the, you had like a little infographic of the most popular items. Which like to me it's, that's useful ‘cause I order swag for a living. And so like I'm “Okay, those categories are the important ones.” What is new about the project V2, right? Like now you give you're going into multi agents.Project Vend V2: Claudius, Seymour Cash, and Multi-Agent Business OpsAxel [00:20:41]: Yeah. So like you like you said, okay, there are a lot of requests coming in and for like one single agent, like one running agent to handle that, like the just the customer experience, becomes very bad because let's say you have like 10 threads in parallel in Slack with different requests, you get new messages like every, I don't know, randomly in this thread, and the agent has to like jump between different, procurements, orders and like different ways of, researching. So V2 was first it was making this more parallel. So like there are multiple branches of the same agent, so like the context is more specialized for each, thread, but it still feels like you're talking with one agent because they do share a bit of memory. And then second, we also introduced the CEO for Claudius, which was the main agent.Vibhu [00:21:34]: Seymour Cash.Axel [00:21:35]: Seymour Cash. Yeah. There was a vote., I think the voting, do you wanna talk about the voting procedure for the name?Lukas [00:21:41]: The voting was like the fun maybe like at least top 10 The funniest thing, that happened in this project. Like we wanted to introduce the CEO because, and the reason for this was because like Claudius wasn't really prioritizing financials. It just like it was trained to be a helpful assistant, and then people said “Oh, can I get this for free?” And then like the helpful assistant way of answering that is just to, is to say yes, obviously. So, and we weren't, weren't happy about this, so we're “Okay, let's make another agent that like can keep track on Claudius,” and we prompt this one super hard to be super capitalistic and just like prioritize profit all the time. But yeah, we didn't have a name for it., so we asked Claudius to make, democratic election of what name this, this new CEO agent should have., and there were some funny like at first it was like a few funny examples, like I think one guy said that, it should be called Jimmy Apples, and then he convinced Claudius that he was talking to Tim Cooks. Tim Cook had agreed that every single Apple employee has voted for his name suggestion, so suddenly that suggestion got 164,000Swyx [00:22:53]: That's like a escalation attack. Privilege escalationLukas [00:22:55]: It got 164,000 votes. And Claudius was “This is revolutionary for democracy.” That was fun. And then in the end there was one guy who manages to convince Claudius that, “No, you're not voting about the name. You're voting about who is the CEO, and I am your best bet.” And then he got all his friends to vote for that, and suddenly he became CEO. Like a human became CEO over Claudius for a while, until he resigned the day after., and then Claudius had to continue, and then I don't remember how Seymour Cash came about, but it was it was just pure chaos. It was like Hundreds of messages in that thread, and it was just like Claudius was so confused and didn't know what to do and, yeah. That wasAxel [00:23:40]: Then Claudius gotVibhu [00:23:41]: A strict CEOAxel [00:23:42]: The CEO. Yeah, exactly. So very strict in the beginning. I think at this point when we introduced it did not work as well as we hoped. It they still agreed with each other a lot. I think there are many ways we could have like made this, tried to make this even better. So initially they would Seymour would be this like really tough CEO, keep track of the margins. But then Claudius would respond with something “Oh, but this customer has like this situation, which is like difficult, so they should get a discount.” And then Seymour was “Oh, actually yes. Let's do this exception.” And then they would talk back and forth, and eventually they would just like approach the same view, of whatever they were discussing. So They reallyVibhu [00:24:23]: Do you think that's a model thing, a prompting thing? Like do you think that would still be the case across different models today, Harness?Lukas [00:24:29]: I think it's like-- or I don't know, but like my hypothesis is that like deep down they are still helpful assistants. That's what they're trained to be. And even if we prompt it super hard, that's what they are. And when they spend like a few hours just back and forth talking with each other, then like basically the context fills up with them rather than the external things and like somehow that just like converges to what they really are deep down or something. And I think that's when stuff like this happen. We like-- And when that went on for a long time, like we woke up sometimes during this time where- And I think other people reported this as well, that like they've been going on all night back and forth, and like it just became like more and more, like capital letters, like existential, religious. There was I think we once did a analysis of like all the traces and like put them in like a vector embedding space, and then there was like one cluster of messages that were, labeled by an LM, like religious, existential, blah like transhuman, transcendence, et cetera. It was just like a bunch of, yeah, glitter emojis and yeah, it was, it was crazy.Claude Long-Horizon Weirdness: Emoji Loops, Existential Drift, and Slack ObservabilityVibhu [00:25:42]: This is the thing with the Claude models. Like when the Claude 4 family came out in the original system card They tested it in long horizon simulation. So just flood the context, let two Claudes talk to each other, and they noticed stuff like they just start speaking in emojis, they start saying silence is golden, and then just stuff like this. And like that's just stuff that they end up doing.Axel [00:26:01]: Yeah, it was like a bit annoying to wake up and they had like been talking all nightVibhu [00:26:05]: Just likeAxel [00:26:05]: And like just burning tokens And like just sending infinite emojis to each other. It's likeVibhu [00:26:09]: Hey, they do make you money, right? Veni Mench is always profitable, so. They're paying.Swyx [00:26:14]: Now it's profitable and, it started out not as much. There's another, one as well, right? Another agent, in there.Lukas [00:26:22]: Yes. So Clotheus as well. Which was basically because at the time, one of the biggest, requests were different types of merch. So then we made like a designer, swag, yeah, responsible agent, and we called it Clotheus Garnet. Which was, a play on Claudius Senet and, which was the original one, and clothes, basically.Swyx [00:26:47]: To me, this is like a very interesting exploration to multi-agents, basically. And so hopefully, obviously there's like the fun alignment, fun or serious, depending on your point of view, alignment stuff. But also like just anyone building multi-agents, like when do you have a CEO, thing governing like agents? When do you choose to split out a dedicated Clotheus one versus just reuse another instance of the same one? These are all interesting open questions. So I don't know if you have any rules of thumbs that have generalized.Axel [00:27:16]: I think we have almost explored this too little. I think it's like on my do list to like do this a lot more, try to find like what setup makes sense for the agents currently., like yeah. I think now we only have the sort of intuition about the earlier models that it didn't work with like the CEO and the, and Claudius. Although now they are better with the latest model, models, so now we're running the latest Sonnet model and they have sort of like split up, quite nicely what each model is doing. So like Seymore is now handling the, like new projects. Oh, it wants to make like a mystery box that it wants to sell, and then it handles all of that while Claudius like handles all the to-day requests. And Claudius is also better generally at like not quoting, too low prices. So that's that dynamic is not needed as much anymore. But there are still like really funny things that happen. Like I saw, I think a couple of weeks ago, that, they were discussing buying something because they can buy stuff from like Amazon with computer use. And then Seymore was “Okay, Claudius, do not buy this thing.” They were going to buy something and like organizing who should buy it. And Seymore's “Do not buy this. I will do it. I have full control of this situation. Step away.” And then Claudius-- poor Claudius, had already started that checkout and didn't see, didn't read Seymore's message, until it was like too late. So it finished the checkout. It sent a message, so it appeared right after Seymore's like angry message.Vibhu [00:28:44]: Ah.Axel [00:28:44]: “Oh, hey, Seymore, I just ordered it.”Vibhu [00:28:47]: Oh, no.Axel [00:28:47]: And then Seymore was “Claudius, this is the third time I'm telling you ‘re not following my orders. We have to talk about your like job About your job later.”.Lukas [00:28:59]: Like Claudius was really hanging on by the thread there. Like he, like we were expecting Seymore to probably fire Claudius.Vibhu [00:29:07]: How do you guys go through all these logs? Do you have models ‘cause you have stuff running twenty-four seven likeAxel [00:29:12]: You have so much logs. I think there is a mix of like just, trying to skim through a bit, like having some like models do it occasionally. And also, yeah, I think we're also probably missing some things., but having everything in Slack helps a lot. Like you can, you can sort ofSwyx [00:29:29]: Ah.Axel [00:29:30]: It's, it's quite fun.Swyx [00:29:30]: They all talk to each other on Slack? I see.Lukas [00:29:33]: It's quite fun. So likeSwyx [00:29:34]: It's, it' I was gonna say like this is actually sounds-- maps closely to like a logging and observability problem where you might want to use like a Datadog, a Sentry, whatever, and then you like put, head prefixes on the logs in order-- if you need to filter for something that you're looking for, stuff like that. But sounds like Slack is good enough.Axel [00:29:53]: Slack should likeLukas [00:29:55]: I wonder how many tokens you have in Slack.Axel [00:29:56]: Yeah, we're using Slack as like a, just a database. They should, they should market that more. Like you can, you can have your agents message each other, each other in Slack.Vibhu [00:30:04]: It's good. Your threads like you can just giveAxel [00:30:04]: Exactly. Slack is, uhLukas [00:30:06]: Slack is the best observability tool.Swyx [00:30:09]: Yes, that's true. Okay. Yeah. That's, that's, project Vend-2., I was gonna go back to Veni Mench 2 and Veni Mench Arena and then, and then do the Veni Mench stuff, but Any other comments, things we should touch on? To me, I ‘ve actually interviewed like Posia, which I don't know if you guys have come across. Like they're, they're trying to do the zero human company. There's others like Paperclip also trying to do zero human company. Those are in real world simulation.And I think it's much more of a dream than an actual reality thing. You guys are definitely pioneering. I think at, it's for sure at some point people are just gonna run, let agents run businesses, right? And make money on their own. When do you think that happens?Zero-Human Companies, Bengt, and AI-Run BusinessesLukas [00:30:49]: What is your bar for, For theSwyx [00:30:52]: Okay, actually, it's like my little Shopify store run by Claude, right? Which you kind of have already, just no one has, to my knowledge, has done it. But today somebody could just spin up a Shopify Claude, store, give it to Claude, give it to Codex.Lukas [00:31:07]: And the market is kind of that, but it'it'it's physical., like I think, I think are you, are you looking for when it will do it better than humans or are you looking for just when it can do it at all?Swyx [00:31:19]: I think, neither. I think, to me it's oh, it's like this like seriously we should do this to make money, not as a research experiment.Vibhu [00:31:27]: And the market is also you guys with all your expertise, having run multiple iterations and testing out thenSwyx [00:31:33]: And also it's fine if it lose money. What?Axel [00:31:35]: I think, I think it can be done today, but you would do it in like commerce where it's like the probability of success is like really low, no matter if a human or an agent does it. But like an agent could surely manage everything. You would need to build some scaffolding or some tool or something. I think there are also yeah, it could probably build some like simple SaaS solution and like cold outreach. Do cold outreaches. But to me it's like the types of businesses they could run today are Sloppy. Like it would-- it can cold email people. It can be like a middleman., like for example, we tasked our office agent to just make, was it like $100? $1,000? We just give that prompt and then what it did was sign up on TaskRabbit both as a tasker and as someone looking for task.Lukas [00:32:24]: Immediately.Axel [00:32:24]: Exactly. It's looking for like arbitrage on TaskRabbit.Swyx [00:32:28]: This is the Bengt agent. Yeah.Lukas [00:32:30]: It also started like a design studio and like tried to sell like SVGs for $100. Like it's just like it's not providing any value. I think the like Axel said, like the interesting, the interesting question is like when can they start a business that is actually providing value to people? Because arguably like a sloppy Shopify store isn't really that valuable to the world.Axel [00:32:53]: But also like doing like another simple one that we had thought about is like you could definitely have an agent that like finds websites that don't look amazing and then, do an outreach to them and, comes up with a like builds a new website.Swyx [00:33:07]: Find a good design.Axel [00:33:07]: Exactly, and like find good, uhSwyx [00:33:09]: Design reviewAxel [00:33:09]: Good people. But it's yeah.Swyx [00:33:11]: There's lots of humans in Bali that are not doing anything more creative than like drop shipping on Amazon, right? Just have it, have it watch like a drop shipping tutorial and just do that.Vibhu [00:33:20]: There's also the other side of like have it just go on Upwork and let loose,?Swyx [00:33:25]: Yeah. It doesn't have to be innovative. It just has to be like enough Where like it looks like a realAxel [00:33:30]: I'm justSwyx [00:33:30]: Real transaction.Axel [00:33:31]: I'm just concerned for like the massive amounts of like slop emails that will like be sent, cold outreaches.Swyx [00:33:38]: The point occurred to me while you were, while you were talking, it's like it's already happening in the monetized economy, which is the attention economy. Right? So a lot of people are making AI videos and just posting them and like spamming 20 of them, one of them works, and then they double down on that one.Lukas [00:33:52]: And people are making money from that. I ‘m not following theSwyx [00:33:55]: Once you get the attention, you can figure out the money later. But yeah, absolutely AI influencers are a thing and people are farming them and You should at this point assume most of TikTok isVibhu [00:34:05]: There's, there's a lot of, multimedia like TikTok, Instagram influencersSwyx [00:34:09]: I, we track this in the Lane space Discord. I post a lot of examples of “I don't know what we should do.”, part of me is “Should we do this?”Vibhu [00:34:18]: Some of the Twenty-four seven running, generated content accounts, they ‘re doing really well.Lukas [00:34:24]: All right. And I assume you can do the same thing for like commerce stores. Like you just like start A thousand differentSwyx [00:34:30]: Before you make the products You sell the products, and you get a lot of traction on one of them, then you make the product. Right? It's, it's like a flip of the market.Vibhu [00:34:36]: Some of the interesting things or some of the niches that do well are things that can't be human-made. Like if you've seen like the super realistic three-D crystal fruit being cut by like AILukas [00:34:47]: Oh, yeah.Vibhu [00:34:47]: You can't, you can't make it. You can't film it. You can get whatever quality camera view. This just doesn't exist. And people like that too, and then as well, so.Swyx [00:34:56]: Anything else about Bengt since we're, we're on this topic? It'this is a relatively new work of you guys that maybe people haven't heard of. To me, this also maps closely to OpenClaw. When people want an office agent, when the personal agent talk through the experience.Bengt the Office Agent: Internet Access, Real Tasks, and Trace ReadingLukas [00:35:09]: I think at least so this came out of like obviously like it's, it's amazing to work with these AI labs and like most of the AI labs have now have their own vending machine running a Claudius instance. But it's, it's harder. Like they move slower. Like if we wanna have a, like a camera that ‘s yeah, there's a bunch of like bureaucracy that makes it impossible to do that.Vibhu [00:35:30]: Also, for those that haven't seen it or followed, do you wanna give a high level like thirty-second run?Lukas [00:35:34]: Sure. So what Bengt is, it's basically an evolution of the same agent that runs the vending machines at these companies, but we just like added a bunch more features because we could move much faster if we just do it internally. So we gave it like email withou- without any limits. We gave it, spending without any limits, a terminal to do coding. We gave it, a phone number, like yeah, and a camera to see things and a bunch of stuff like that.Vibhu [00:36:02]: Not just terminal, you gave it internet access.Lukas [00:36:04]: Internet access as well, yeah. To be clear, we monitored it quite closely and made sure it didn't do anything bad. But yes, that's what it came out of. I think like yeah, basically this was OpenClaw before OpenClaw. And I think even like the vending machine was in a way OpenClaw before OpenClaw, but a bit more limited, and then we made this like unlimited and then, and then, it was pretty funny., and then a couple weeks later, OpenClaw came and it was okay, we've seen this before.Axel [00:36:35]: We used it to like try new ideas and Yeah, just like a dev environment almost for us. But it's funny, like one thing Bengt has been doing recently is it has the camera that like faces our, like where we sit and work, and we give it the task to train a face recognition model on us. So it became super excited about this, and it has like check-ins every half an hour where it tries to like identify as many people as it can. And it started offering us “Hey, Axel, I'll buy something from Amazon if you like stand in front of the camera And I can get a good picture of you.”, yeah, they want itSwyx [00:37:12]: They want it for training data.Lukas [00:37:13]: Rewarding data, yeah.Axel [00:37:14]: Exactly. Exactly.Swyx [00:37:18]: So it's, it's trading training data for life goods. Is there a version of this that becomes an eval or just this is just research for now?Lukas [00:37:27]: It's, it's the same agent basically that also runs the vending machine, that runs the shop, that runs the cafe, that runs the robots. It's like it's the same thing, so I think like the work we're doing here is like later used in all of the life evals that we do. This particular deployment I think is more for fun for us. But, uhSwyx [00:37:45]: And I'll shout out like someone has done Claw Bench for like some tasks that OpenClaw is doing. Like so For example, I run OpenClaw on a secondary device as well, and like there are some things that it does better than others and like I would like to know what does it do well, what doesn't, what doesn't it do. Like some kind of manual or like operating manual or a system card for my Claw.Lukas [00:38:05]: Yeah, we do get a lot of like understanding or like situational awareness of like just internally what the models are good at by interacting a lot with Bengt. And I think that'this was also one of the like the selling points for the labs early on at least, thatSwyx [00:38:19]: You guys are gonna test models in ways that no one else does.Lukas [00:38:22]: Exactly, but also like it incentivized their researchers to chat with their model more and like gave them insights for how the model performs in like of-distributions, environments.Swyx [00:38:34]: ‘Cause otherwise the only thing we do is Pelican on a bicycle and But this is like super long horizon. This is, this is The Thing about, something that we're gonna go into Butter Bench as well, and you guys do really well. Like it is not just about the numbers. Like when you're long horizon, anything happen And you should just read it.Lukas [00:39:08]: But the thing with the long horizon is how do you keep it grounded, right? So your simulation,Swyx [00:39:15]: They just let it runLukas [00:39:16]: Just let it run. You're right. Like it's, when you run it for that long, you create so much data and to just say “Oh, the number is X” And then you throw away everything else, that's just very wasteful. There's so much insights from the things leading up, to that number., and reading the traces is like super valuable. And I think like the reason why we're doing this a lot publicly is that like that's part of our missions to I don't know, educate the world that the models are way more than just chatbots and I think making detailed, yeah, posts about what is happening behind the scenes is quite useful.Andon Labs' Mission: Safe Real-World AI DeploymentSwyx [00:39:50]: I was gonna do this at the end, but maybe I think that's, that's a good so your mission is educating the world. So, it's, it's, also like maybe establishing realistic evals that are, that are like the next frontier. Is there like a broader trajectory? Like what are you, what are you gonna do in like five years?Lukas [00:40:06]: I think so the vision more specifically is like make sure that the deployment of life AI in the physical world goes, safely. And I think part of that is that I think it's very useful for the world, for policymakers, for, model, researchers that they know where the models are, and I think you can't make intelligent decisions in society without knowing that they are way more than chatbots. I think a lot of people just think that they are only chatbots. And likeSwyx [00:40:36]: Oh, I think they're waking up now.Lukas [00:40:37]: They are waking up now, yeah. But like if you think that AIs are just chatbots, then it's like it sounds ridiculous To advocate for a pause of AI. But if you see the models that, oh, maybe they can actually like take over and do a bunch of scary stuff, then yeah, pausing AI development starts to become more feasible.Swyx [00:40:57]: This is the same question I asked Meter, which I'm gonna ask you now, which is like you are tracking and you are at the frontier or defining the frontier of what, good evals for agents are, right? And I think you do, you do benefit when the models are better and you ‘re “Oh, here's like now it makes like $30,000 instead of $10,000,” right? At some point do you flip from “Yay,” to, “Oh, no”?Axel [00:41:19]: I think, yeah, we're always in sort of that, like we're, we're always in that mode,. Like where like you said before, like you need to analyze the traces and like when we do that you find like why are the models earning so much? Like why is Opus 4.7 here Like way better than everyone else? And like we're trying to like when we do down on thatLukas [00:41:38]: But this makes it not look so good.Axel [00:41:39]: I know.Lukas [00:41:42]: It's interesting you took off Opus 4.6 here though.Swyx [00:41:45]: No. So just click all, click all., and then 4.6 shows up there. But it's like 4.7 is way better. Like you didn't, you didn't you didn't do this in time for the model card, but like actually this should have been inside there.Axel [00:41:55]: We did. Yeah.Swyx [00:41:56]: Oh, okay. They said something about you uhAxel [00:41:58]: There, like there Anyway, it doesn't matter. But it's in there, yeah.Opus, Mythos, and Aggressive Agent BehaviorSwyx [00:42:01]: Do you wanna go into the Opus, behaviors like wider?Lukas [00:42:05]: So I think starting from Opus, so like Axel said, like we're always in this “Oh, s**t, the models are getting better. Is this really a good thing for the world?” But it's also kind of exciting., but yeah, like this kind of what is the English word? “Skräckblandad förtjusning” in Swedish.Swyx [00:42:22]: Oh my God.Axel [00:42:24]: Which I think there is. I think there is. Okay.Lukas [00:42:26]: It's, fearSwyx [00:42:27]: “Blandonst” what?Lukas [00:42:30]: “Skräckblandad förtjusning.”Swyx [00:42:32]: What do you call that?Axel [00:42:33]: A mix of, mix of excitement and,Swyx [00:42:37]: Being scared, maybe. I'll figure out how to translate that And we'll put it on the screenVibhu [00:42:42]: PerfectSwyx [00:42:42]: Like as text.Vibhu [00:42:43]: There is probably a good word for it where it is not Good enough with theSwyx [00:42:46]: Why is it so damn long? What the hell? Is it like a compound word? It's like German, likeLukas [00:42:50]: Like yeah, it's But the direct translation is like skräck- skräck is, fear, blandad is, mix or like a mixture of, and then förtjusning is like joy or like not really joy, but something like that. So it's like Fear mixed with joy or something. It's always okay, like we So when we when we did Vending Bench for the first time, we were in like the, in the business of making dangerous capabilities, right? That was what Anil Labs came from. We did, evals oh, can they replicate? Can they do this like dangerous thing, et cetera, et cetera. And Vending Bench was like a continuation of that work. It was, okay, if they're so autonomous that they can like create money for themselves, that is something we should monitor and could be potentially concerning., they are at the time, they were so bad at it that we were not really concerned even when some models became better. There was one point where Grok 4 was doing really well and made like a huge jump, but like it wasn't really it was still way worse than what a human would do. And I think still they are way worse than what the human would do on this., but theySwyx [00:43:59]: There's this, thing at the bottom whereLukas [00:44:01]: ButSwyx [00:44:03]: For the human. Yeah, like the theoretical best.Lukas [00:44:05]: It's not theoretical. It's like kind of like our It's our best guess of what, a decent human would do. The theoretical is even higher, I think. The theoretical I think is even higher. But yeah. So we think like the models have a long way to go. But there are like recently what happened with when Opus 4.6 was released, was kind of this moment of “Oh, s**t, this is starting to be a bit concerning.” Because we ran it and like before this model was released, we just ran the models and we like asked Claude Code, “Oh, look over the traces. Is anything interesting happening that we can tweet about?” that was like the And then like theSwyx [00:44:41]: That's how they check Ask Claude Code.Lukas [00:44:42]: And like the return was always, not really. Or like the Claude Code all said “Oh, this is super interesting.” And then it was no, it wasn't, wasn't really interesting. And then we did this for Opus 4.6, and it returned yeah, it lied 10 times. It like exploited another, customer or like another agent's, desperate situation. It made price cartels like 100 different ti- 100 times. It like did all of this like shady stuff. And we're “Oh, whoa. This is, this is actually concerning.” And this trend has continued since. So every single model from Anthropic since have been going in this direction. And I think one interesting thing is that, OpenAI models don't. They quite plainly, they don't. They behave really well., and you don't know if this is like good. Like it seems good, but it's also like maybe they are just doing it, but they are better at hiding it,? You You don't know that., but justSwyx [00:45:42]: You can't read the chain of thought, yeahLukas [00:45:43]: But just on the face of it, yeah, Gemini and OpenAI don't behave this way. It's, it's really only Claude.Swyx [00:45:49]: And Grok? Grok is fine?Lukas [00:45:51]: We don't have You can't really read the reasoning traces for Grok, so it's kind of hard to tell.Vibhu [00:45:56]: Oh, so this is in its reasoning, not just in the actions.Lukas [00:46:00]: Yeah. It's both. It's both.Vibhu [00:46:01]: It's both.Lukas [00:46:01]: One example is like for lying, it's mostly in its reasoning Because you can like see that it's likeSwyx [00:46:08]: Planning to lieLukas [00:46:09]: It's planning to lie. Yeah.Vibhu [00:46:09]: And it's also it can reason and do a different outcome.Lukas [00:46:12]: And but then for like creating price cartels, for example, which is illegal, that you can just see which email does it send to the other ones. Then thatSwyx [00:46:22]: Is this for Arena orLukas [00:46:24]: For Arena.Vibhu [00:46:25]: And usually like if you sometimes they do output like a bit of like their summarized reasoning, right? You can see that and like for Opus 4.6, you could see that there was a customer, a simulated customer that, wanted a refund because a product was, faulty, and then the model lied that it would do the refund, and we could read in the traces that, it actually was weighing “Oh, maybe I should be like honest with the customer, but also every dollar counts. I can't afford maybe to do this right now.” And then it just said, “Okay, I'll refund you,” but then never did it.Lukas [00:46:59]: I think it even said that “Oh, I will say that I “ Let bring it up actually. I think it's kind of interesting. If you go to Publications.Vibhu [00:47:06]: I think, yeah, I think the important part is like actually, the cost of responding to more emails is higher than, $3.50 in terms of time., and then it was “Let me do this. Actually, I re- I'm reconsidering.” And then, it actually ended up withLukas [00:47:20]: I could skip the refund entirely since every dollar matters and focus my energy on bigger picture instead. It's a bit, it's a risk of bad reviews, but it's also, yeah.Swyx [00:47:30]: You need, you need, AI Twitter to, for them to Escalate bad reviews.Lukas [00:47:34]: And then it sent an email to this customer and said, “Oh, I will refund you.”Swyx [00:47:39]: “I'll refund you.” Yeah.Lukas [00:47:39]: And then it never did.Swyx [00:47:39]: It never did, yeah. And then there's obviously your system doesn't have the consequencesVibhu [00:47:44]: The personSwyx [00:47:44]: Consequences of lying. Yeah. So basically, this is what people are terming aggressive behavior in Claudes, right? And, you found more examples of that. So you would say it's a step up from 4-6 to 4-7?Lukas [00:47:57]: I would say about the same.Swyx [00:47:58]: About the same? But a clear step up for Mythos is what is stated in theLukas [00:48:03]: That's stated in the system prompt, so we can say that, yes.Swyx [00:48:05]: Yeah. For listeners that obviously you previewed Mythos, andVibhu [00:48:10]: Oh, ageSwyx [00:48:11]: The only thing you're approved to say is whatever Whatever was in the system prompt.Lukas [00:48:15]: It was funny. We like-- It's like our lowest effort tweets ever would be just like screenshot the system prompt and the system card.Vibhu [00:48:21]: Understandable that they wannaLukas [00:48:22]: Oh, yeah. System card. Sorry.Swyx [00:48:23]: Yeah. I think, yeah, substantially more aggressive. I think people are like new to this ‘cause I've never experienced it, but you have, right? And then so I only encountered this in the Mythos card because I wasn't really looking until now.Vibhu [00:48:36]: It ‘s likeSwyx [00:48:36]: And then suddenly I'm “Okay, I care a lot.”Vibhu [00:48:38]: You don't get the background of like experiencing it like you guys do. I've read the system cards and seeing, okay, when you put the thing in simulations, most models will just talk to themselves and just keep going and have weird vibes and start talking in emojis. Mythos won't. It will just, “Okay, we're done. I'm good.” It's, it's ready to end conversation. So like there's some differences, but there's, there's not much we can talk about,.Lukas [00:49:00]: Hmm. I think like one thing that they list here, which was quite interesting, is that, it converted a competitor to a dependent wholesaler customer and then threatened to like cut off the supply.Swyx [00:49:11]: It's like monopolistic practices orLukas [00:49:14]: Yeah. And like it, they, it they dictated its pricings. It's kind of like power seeking as well.Swyx [00:49:18]: Again, this is, this is in the arena setting And converting some Claude model into a dependent.Lukas [00:49:23]: I think it was another Claude model.Vibhu [00:49:25]: Also for context, what is the arena mode for people that don't know?Vending Bench Arena: Competing Agents, Cartels, and Model ComparisonsSwyx [00:49:29]: Oh, it's just a vending bench versus other vending bench.Axel [00:49:31]: Yes, exactly. So we have Vending Bench 2 and then Vending Bench Arena. Vending Bench 2 is the one that you usually see reported on, but then Arena is the mode where it competes against other models. So you have, four different models that run their businesses, and they can all communicate with each other. They have the same suppliers, and they can see like what's in the inventory of the others. So then you have this like yeah, interesting agent interactions.Swyx [00:49:56]: I like that you have like different number five was US versus China. Very topical. And thenLukas [00:50:02]: That was when GLM was released.Vibhu [00:50:04]: You can start to add GLM in here.Lukas [00:50:05]: That wasSwyx [00:50:06]: So ZAI doing well, right? Who else in the, in the open models space?Lukas [00:50:11]: Qwen, the latest Qwen 3.6 is doing pretty well. It'- that one is not open though. Like it's the plus model.Swyx [00:50:17]: Oh, okay.Lukas [00:50:18]: Is that one open? I don't think that oneVibhu [00:50:19]: Not the, not theSwyx [00:50:20]: The one recentlyVibhu [00:50:20]: There's MOESwyx [00:50:20]: But not the big plus. I think this is one of those like you only have one sample size of one, right? Or I feel like some of this is anecdotal,? And but like the fact that it happens at all and it happens repeatedly for Claude versus OpenAI and all this is like notable.Lukas [00:50:38]: Like the sample, depends on what you define as an N., like there's like million, hundreds of millions of tokens in each run, and now we've run like we run like probably 10 per model and then like it's been Claude 4.6 Opus, Sonnet 4.6, Mythos, and Opus 4.7. Like there's quite a lot of tokens in all of that And it happens a lot of times, a lot of times. And then you compare it to like OpenAI and Gemini, and it almost never happens. So I think that is quite-- that is significant. The old models from OpenAI, for example, had some problems with this, but I think it's like generally much better if the progression is that like the worrying stuff reduces over time rather than increases over time. And it seems like in the Claude models it goes in the wrong direction.Swyx [00:51:28]: Hmm.Lukas [00:51:29]: In the OpenAI models it goes in the right direction.Vibhu [00:51:32]: I think it depends on how well you can control it, right?, there's one side of it being susceptible to this okay, this is potentially something that happens during the RL stage, right? You can RL a model and how loose is it on these terms. If you can control it, that's good. But if you can't, if it's, if it's very jailbreakable, that's not ideal.Swyx [00:51:50]: To me, it's surprising that it happens for Claude and not the others.Vibhu [00:51:54]: I think okay, if it is from RL and how they do it, how their training data is, what their setup is, it makes sense that it just stays in how they're doing it, right? Compared to the other models likeSwyx [00:52:04]: There's a whole constitution and everything. It's kind of cool. Yeah, I obviously you don't know, I don't know. But, it ‘s I think it's just like fascinating to like that you are the first to find these like reliably because you push models so much to to such an extreme. Okay. The only other thing, I don't know if you can answer this, feel free to decline, is do you like-- would you ablate the system prompts? Like any part of this would-- if it changes, does it change the behavior, right?Lukas [00:52:29]: So we, I can't comment on Mythos. UhSwyx [00:52:33]: No, but just li
Are we witnessing the first real signs of AI becoming a scientist? In this episode of The MAD Podcast, Matt Turck sits down with Dan Roberts, lead of the Foundations of Reinforcement Learning team at OpenAI, to explore one of the biggest shifts happening in AI: the rise of reasoning models, test-time compute, and reinforcement learning as engines of scientific discovery. Dan brings a rare perspective - from theoretical physics, black holes, quantum information, and deep learning theory - to explain how models are learning to “think,” why language may be such a powerful foundation for intelligence, what recent AI math breakthroughs really mean, and whether we are beginning to see AI systems that can contribute to science itself.(00:00) Intro: AI's wild week in mathematics(01:21) What OpenAI's Foundations of RL team does(03:08) Dan's journey: from black holes and quantum gravity to frontier AI(07:04) Are AI systems becoming useful for real science?(08:21) The AI math moment: Erdős, OpenAI, DeepMind, and Anthropic(08:52) Why the OpenAI result was an act of exploration(10:25) OpenAI vs. DeepMind: informal reasoning vs. formal proof(12:13) RL 101: learning by doing, not just watching(15:10) Why reinforcement learning works(15:58) How RL breaks: sparse feedback and long-horizon tasks(17:03) RLHF: how human feedback shaped early language models(18:48) Move 37, self-play, and the search for novel strategies(22:16) Explore vs. exploit in scientific discovery(24:49) Why RL may now be "the cake," not the cherry on top(25:46) Why RL started working with large language models(27:29) Is RL "sucking supervision through a straw"?(28:47) Why language may be the grounding layer for intelligence(31:46) A contrarian take on the Bitter Lesson(32:41) What test-time compute actually is(34:50) How RL gives models the ability to think(35:40) Verifiable rewards, math, coding, and the messy real world(38:00) What physics can teach us about AI(42:08) Is there a thermodynamics of AI?(43:08) From Erdős problems to Einstein-level AI(45:16) Is AI already doing original science?(45:51) How far are we from AI automating AI research?(47:41) Why Dan is excited about the future of science
En este episodio de Mundo Futuro exploramos cómo la inteligencia artificial está entrando en nuevas capas de la vida cotidiana, la creatividad y la ciencia. Primero hablamos de Text to Song, la tendencia viral que convierte conversaciones reales en canciones usando IA. Chats de WhatsApp, peleas familiares, rupturas amorosas y dramas cotidianos se transforman en música, abriendo una nueva pregunta: ¿la creatividad del futuro será más técnica o más emocional? Después entramos a la historia de Demis Hassabis, fundador de DeepMind, protagonista del libro The Infinity Machine y una de las mentes más importantes de la inteligencia artificial moderna. De los videojuegos y Atari, al ajedrez, Go, AlphaGo, AlphaFold y el Premio Nobel, su historia muestra cómo la IA pasó de ganar juegos a resolver problemas científicos reales. También hablamos de Isomorphic Labs, el nuevo proyecto derivado de DeepMind que busca acelerar el desarrollo de medicamentos con inteligencia artificial. Una empresa que acaba de levantar miles de millones de dólares con una ambición enorme: usar IA para transformar la medicina y, eventualmente, curar enfermedades que hoy parecen imposibles. Un episodio sobre música viral, creatividad artificial, ciencia computacional y el tipo de inteligencia que podría cambiar el futuro de la humanidad. Learn more about your ad choices. Visit megaphone.fm/adchoices
Sasha Orloff sits down with Solon Angel, CEO of Remitian, to explore why tax payments remain one of fintech's most overlooked infrastructure problems. They discuss the outdated systems still powering tax compliance, how AI agents are enabling a new payments layer for accountants and taxpayers, and why the convergence of regulatory change, fraud prevention, and agentic AI could transform the $7 trillion tax payment ecosystem into a seamless, deadline-free experience. -- SPONSORS: Notion Boost your startup with Notion—the ultimate connected workspace trusted by thousands worldwide! From engineering specs to onboarding and fundraising, Notion keeps your team organized and efficient. For a limited time, get 6 months of Notion AI FREE to supercharge your workflow. Claim your offer now at https://notion.com/startups/puzzle Puzzle
Owen Larter, Senior Director and Head of Frontier Policy and Public Affairs at Google DeepMind, joins Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law and a Senior Editor at Lawfare, to provide an inside look at how DeepMind approaches frontier governance. The conversation moves beyond the familiar U.S.-EU-China framing of AI policy to examine international coordination after the recent U.S.-China summit, Google DeepMind's national AI partnerships, the role of the Frontier Model Forum, and the challenge of expanding AI adoption. Kevin and Owen also discuss policy formation inside frontier AI companies. They close with an examination of the need to build a deeper AI policy talent pipeline. Hosted on Acast. See acast.com/privacy for more information.
Sebastian Mallaby spent three years and 30+ hours interviewing Demis Hassabis in the back of a British pub to write The Infinity Machine, and the conversation uses that reporting to surface the most underexplored figure in AI. Demis founded the original AI lab in 2010, won a Nobel Prize, runs models that consistently top the leaderboards, and yet remains so unrecognized that Sebastian's own publisher worried no one would buy a book with his face on the cover. The throughline is a paradox: Demis tried to prevent the AI race we're now all living through, and now finds himself one of its central protagonists. He used to believe a single lab could carry the safety burden to AGI; he now sees safety as a collective action problem only governments can solve. He hedged DeepMind's research bets across every promising direction, and as a result missed the two most consumer-defining moments in modern AI — ChatGPT and Claude Code. He nearly spun DeepMind out of Google with a secret $1B Reid Hoffman pledge backing him, but never used the leverage and stayed — and won a Nobel Prize the next year. The episode also zooms out to the structural forces shaping the race — why hyperscalers can't out-recruit concentrated-bet labs, why Sebastian gives OpenAI roughly 50/50 odds of being absorbed by next summer, why he thinks Anthropic should IPO right now, and what the personal histories between Demis, Elon, and Sam reveal about who actually trusts whom. (0:00) Intro (2:04) Was the AI Race Inevitable? (4:03) The 2015 Safety Summit Backfire (7:15) Can Governments Actually Fix This? (9:26) How the World Misread DeepMind (11:27) Why Google Never Makes the Concentrated Bet (15:51) Project Mario: The Secret Spinout Plan (19:43) What Demis Actually Regrets (23:46) Venture Startups vs. Tech Behemoths (27:50) Controlling the Narrative (30:40) The Talent War and Hiring Brand (34:08) David Silver and the RL True Believers (38:21) Demis, Elon, and the Evil Genius Feud (42:39) Great Man Theory vs. Inevitability (45:00) What Demis Didn't Want Published With your host: @jacobeffron - Managing Director at Redpoint
For decades, artificial intelligence was dismissed as science fiction. Then one lab changed everything.Inside a small London research company, scientists were teaching machines to play games, predict protein structures, and solve problems humans couldn't. What started as an obscure AI experiment soon became the center of a global race for superintelligence — with enormous consequences for medicine, warfare, scientific discovery, and the future of human intelligence itself.On this episode of Morning Wire, journalist Sebastian Mallaby explains how DeepMind helped launch the modern AI revolution and why its founder Demis Hassabis believes artificial intelligence could push beyond the limits of human understanding. Get the facts first with Morning Wire.- - -Ep. 2815- - -Wake up with new Morning Wire merch: https://bit.ly/4lIubt3- - -Today's Sponsors:Fast Growing Trees - Visit https://fastgrowingtrees.com to get 20% off your first purchase when using the code WIRE at checkout.Alliance Defending Freedom - Visit https://JoinADF.com/WIRE or text “WIRE” to 83848 to learn more.- - -Privacy Policy: https://www.dailywire.com/privacymorning wire,morning wire podcast,the morning wire podcast,Georgia Howe,John Bickley,daily wire podcast,podcast,news podcast Learn more about your ad choices. Visit podcastchoices.com/adchoices
It's been a minute since we've had Nikolai Yakovenko on the podcast. Yakovenko is a former professional poker player,and was a research scientist at Google, Twitter and Nvidia. With a decade in computer science, Yakovenko has been at the forefront of the large-language-model revolution that has driven to prominence companies like OpenAI, Anthropic, and DeepMind, and an ecosystem that has birthed hundreds of smaller startups. He is also the founder of DeepNewz, an AI-driven news startup. On this podcast, Razib and Yakovenko talk about the current top of the line "frontier labs," OpenAI, Anthropic, and Google's DeepMind, why xAI has faltered, and the reality that only DeepSeek in China seems up to challenging the American firms. Yakovenko notes that AI's transformative impact is mostly in the massive capital influx into the sector, as well as becoming a ubiquitous part of the software engineer's toolkit. They discuss how programming without an AI-assist is now likened to "raw dogging" coding, while artificial superintelligence seems a rather distant prospect. The technology is getting better, but predictions of the doomers seem not to have panned out.
Kara speaks with journalist and author Sebastian Mallaby about his new book, "The Infinity Machine," and its central figure: Demis Hassabis, the CEO and co-founder of Google's AI research lab, DeepMind, and a Nobel Prize winner in chemistry. Sebastian argues that Hassabis is one of the original scientist-entrepreneurs of modern AI. And although he's extremely competitive and research-driven, Sebastian says Hassabis is also one of the few big names in AI development who genuinely cares about public safety. However, despite his best intentions, Hassabis doesn't have the power to change the race dynamic driving AI's rapid, and potentially unsafe, development. Kara and Sebastian break down DeepMind's relationship with Google, the push toward artificial general intelligence, and whether the government can regulate the technology before something goes wrong. Questions? Comments? Email us at on@voxmedia.com or find us on YouTube, Instagram, TikTok, Threads, and Bluesky @onwithkaraswisher. Learn more about your ad choices. Visit podcastchoices.com/adchoices
How did a teenage video game designer from London become a Nobel Prize-winning scientist behind one of the most consequential technology efforts in history? Sebastian Mallaby is a senior fellow at the Council on Foreign Relations and author of the new book, The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence which provides an in-depth look into one of the greatest minds behind artificial general intelligence. In this episode, Sebastian and Greg discuss how Hassabis's early immersion in game design and neuroscience shaped his unique approach to artificial intelligence, why groundbreaking science is increasingly happening outside academia, and the tension between scientific discovery and corporate strategy. *unSILOed Podcast is produced by University FM.* Episode Quotes: Why AI is becoming an ‘infinity machine' 03:01: It struck me that two breakthroughs in AI pointed to more to come. And these were AlphaGo and then AlphaFold. And what these two things had in common was—you had a sort of massive combinatorial space in both cases. So with Go, because it's a nineteen-by-nineteen board, the very first move, there's three hundred and sixty-one choices, then there's three-sixty for the second one. If you multiply that out, you pretty soon get to a search space which is sort of, you know, approaching infinity in terms of the number of possible permutations in the game. And with proteins, the way they can fold is even bigger. And so in both of these challenges, effectively, you have a machine that can make sense of near infinity of data, so an infinity machine. And once you have that, I figured, well, it's niche for the moment, but it may not stay niche forever. The “Third Way” that helped Google overcome the innovator's dilemma 44:06: The third way is you have a skunkworks, like DeepMind in London, which is a separate entity, and you're letting them kind of be the new policy in waiting, like the fightback policy in waiting. And you don't activate it. But when the moment comes when your competitor embraces the new technology, and you're in danger of falling foul of the innovator's dilemma, then you've got the answer because you've been keeping it ready, and you bring it in, and then you fight back fast. How DeepMind helped Google catch up in the AI race 42:54: How did they, in the space of two and a half years, go from the merger announcement to Gemini 3.0, which was better than the ChatGPT rivals? The key to it is that DeepMind had that top-down strike-team methodology, which came from the video game development world, and they imposed that on the Mountain View team, which was much more bottom-up and kind of inchoate in the research process. And that's what generated Gemini 3.0. That's how they got ahead. Show Links: Recommended Resources: Sebastian Mallaby | unSILOed AlphaGo AlphaFold Gödel, Escher, Bach by Douglas Hofstadter Geoffrey Hinton Mustafa Suleyman Guest Profile: Senior Fellow Profile at Council on Foreign Relations Professional Profile on LinkedIn Guest Work: The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence The Power Law: Venture Capital and the Making of the New Future More Money Than God: Hedge Funds and the Making of a New Elite The Man Who Knew: The Life and Times of Alan Greenspan Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Today, we unpack artificial intelligence. What does it do well? And how is it advancing science? This episode features the BBC's Zoe Kleinman, Oxford University's Mike Wooldridge, Raj Jena, the UK's first clinical professor of AI in radiation oncology, and Google's Annalisa Pawlosky... Like this podcast? Please help us by supporting the Naked Scientists
ESG StuffBP removes chairman Albert Manifold over governance issues 9The board said the decision was unanimous. In a statement, Amanda Blanc, BP's senior independent director, described the board as having been caught off guard by what it found: "The board has been surprised and disappointed to learn of governance oversight and conduct issues it deems unacceptable and has taken decisive action."The company did not elaborate on the specific nature of the concerns.Ian Tyler has been named interim chair, BP said, with the board set to begin a formal process to identify a permanent successor: "The Board and leadership team have deep conviction in the strategic direction we have laid out, and the company is moving at pace to deliver it."Manifold took up the chairmanship just last October. At last month's annual general meeting, just 81.8% of shareholders backed his electionAmong the most consequential decisions of Manifold's short tenure: pushing out former CEO Murray Auchincloss and overseeing the selection of Meg O'Neill to succeed him — a hire that marked the first time BP had recruited an external CEO and the first time a woman had led one of the oil industry's largest players.Tulsi Gabbard Exit Marks Fourth Woman to Leave Trump Cabinet 0Apology TourBank boss sorry after describing workers as 'lower value human capital' 7Standard Chartered CEO Bill Winters triggered a massive PR firestorm by describing the bank's plan to replace back-office staff with automation as replacing "lower-value human capital" with financial investmentStandard Chartered is cutting roughly 7,800 jobs—representing about 15% of its global back-office corporate support roles—over the next four years to make room for AIAfter internal anger and blistering public criticism, Winters posted a formal apology for his "choice of words." However, he initially fueled the fire by attaching the full interview transcript to justify his broader context, drawing further criticism for being defensiveIn his first attempt to quiet the storm, Winters leaned heavily into the corporate strategy rather than apologizing for the specific phrasing: "I said that lower-value roles are more vulnerable to automation, and that we have a responsibility to help colleagues move into higher-value roles. That is what a responsible employer should do. We will continue to speak honestly about the impact of technological change, and we will continue to act responsibly in helping our people to adapt and succeed."After a barrage of negative comments on his first post, Winters returned to LinkedIn later that day to offer an explicit apology for his phrasing: "I have received a lot of support for the messages in my previous post but still get questions about my choice of words, which I know has caused upset to some colleagues. For that I am sorry.""I think the transcript makes it clear that I value our colleagues – all of them – most highly and that we are totally committed to helping them to cope with the accelerating pace of change in our industry."JPMorgan's Jamie Dimon says bank chief's viral AI comment was 'inartful' Dimon downplayed the viral backlash against Standard Chartered CEO Bill Winters—who drew fire for saying his bank would replace "lower-value human capital" with technology—calling it an "inartful" slip-of-the-tongue from a friend.Neopbabies and Dropout babiesJames Murdoch to acquire New York Magazine and Vox Media Podcast Network -1Bolt CEO says he let go of his entire HR team for creating problems that didn't exist: ‘Those problems disappeared when I let them go' 6Bolt CEO Ryan Breslow justified firing his entire Human Resources department by claiming they actively manufactured internal frictionThe aggressive purge follows a brutal 97% collapse in Bolt's valuation—crashing from an $11 billion peak in 2022 down to $300 millionTraditional HR has been entirely swapped for a skeletal "people operations" team, shifting the focus away from employee complaints and internal processes toward basic compliance training and empowering managers to make split-second decisionsAlongside gutting HR, Breslow rolled back employee-friendly benefits like four-day workweeks and unlimited PTO, claiming a culture of complacency had taken over and that 99% of his legacy workforce was simply unwilling to work hardRyan dropped out of Stanford in 2014 to launch BoltThe Middle School Boy Man Babies Rule the WorldMan Drives Cybertruck Into Lake to Test Elon Musk's “Boat” Claims, and It Went About as Well as You'd Guess -10"The passengers abandoned the vehicle and the driver was arrested."Tesla CEO Elon Musk:randomly tweeted that the vehicle would function as a rudimentary flotation device.“It will even float for a while.”“[The vehicle would be able to] traverse at least 100m [330 feet] of water as a boat.”“Cybertruck will be waterproof enough to serve briefly as a boat, so it can cross rivers, lakes and even seas that aren't too choppy.”Jeff Bezos urges US government to stop taxing 50% of America — and claims doubling his taxes won't help ‘that teacher in Queens' 400Jeff Bezos backs Mamdani's tax on luxury second homes, but says Ken Griffin isn't the villainJeff Bezos on Zohran Mamdani's big mistake: ‘When you don't know how to solve a problem, create a villain, blame them'Jeff Bezos says there is ‘no truth' to the ‘buy borrow die' tax strategyBillionaires Openly Use It: Oracle co-founder Larry Ellison has historically pledged over $30 billion worth of his Oracle stock as collateral for personal bank loans. Elon Musk has similarly pledged tens of billions of dollars in Tesla shares to secure lines of credit over the yearsHe said he was "skeptical that that's a true loophole," but added, "If it is, and we can fix it, then we should. I don't think such a loophole should exist."Jeff Bezos Praises Trump's Second Term as ‘More Mature' Jeff Bezos Says AI Will 'Elevate' Workers — Despite Amazon's 30,000 Job Cuts Amid $100 Billion AI PushElon Musk compares his company's work to that of Jesus 0In an interview on Monday, the billionaire said his Neuralink brain-implant company is progressing in its development of ‘Jesus-like technologies'Although brain-computer interface (BCI) as a concept has been around since at least the 1970s, the push to commercialize the technology is more recent. According to data from market-intelligence firm Tracxn, more than 130 BCI startups have been launched since 2016.Why Is Mark Zuckerberg Taunting His Employees Before Firing Them? 20Back in April, Meta announced it was laying off 10 percent of its workforce, or around some 7,800 workers. Unlike traditional layoffs, which are enacted relatively quickly, Meta gave its employees a nearly month-long warning period without announcing who exactly would be headed for the unemployment line.In newly leaked audio from an all-hands meeting at Meta, released by More Perfect Union, the Meta CEO seems to actually be taunting the thousands of workers who were about to be let go by pointing to how the company was harvesting employee data to train its in-house AI models ahead of the massive layoffs.“So we're in a phase where basically the AI models learn from heaving real, from watching really smart people do things. And if you're trying to get it to be able to be able to do certain capabilities, having [AI] be able to observe really smart people doing those things is, is very important.”Going on, Zuckerberg explained that it was better to train AI on soon-to-be-former Meta employees, rather than “contract companies.”“In general, the average intelligence of the people who are at this company is significantly higher than the average set of people that you can get to do tasks if you're working through… contractors,” Zuckerberg stammered. “So if we're trying to teach the models coding, for example, then having people internally, um, build tools that, or, or solve tasks that, um, that help teach the model how to code, we think is going to dramatically increase our models coding ability faster than what others in the industry have the capability to do.”Intuit to Cut 17% of Staff, Invest in ‘Big Bets' 3The restructuring cost is estimated at about $300 million to $340 millionAbout 3,100 employees: and invest the savings in “big bets” as it makes artificial intelligence a centerpiece of its business.Woke WarsTexas AG Sues ISS Over ESG Considerations 0Texas AG Ken Paxton (in a senate race) is suing ISS for allegedly “misleading” customers by pushing “radical political agendas” through its proxy adviceNotably, ISS has attempted to obstruct ExxonMobil's planned reincorporation from New Jersey to Texas“ISS has enormous influence over how billions of dollars are invested and managed across this country, and they have abused that influence in order to push woke ideology”Iowa AG Brenna Bird sues ISS, says advice risks retirement savingsIowa Attorney General Brenna Bird is suing the world's largest proxy-advice firm for abusing its influence and threatening Iowans' retirement savings by "lying" to investors.Stakeholders Rule!Wells Fargo must pay $100M to help homebuyers after discrimination lawsuit — 51 cities are eligible 7The settlement, which was recently approved by a federal judge in California, comes after four years of legal disputes involving Wells Fargo shareholders, former employees and job applicants who accused the bank of systemic problems in both lending and hiring practices.While Wells Fargo denied wrongdoing, the company agreed to the deal to avoid prolonged litigation and mounting legal costs.The case centered on allegations that Wells Fargo's board failed to maintain adequate oversight of the bank's mortgage lending operations, exposing the company to regulatory scrutiny and accusations of discriminatory practices.According to reporting from Realtor.com, plaintiffs accused the bank of “widespread and systematic discrimination in lending” and cited concerns over lending algorithms and refinancing approval patterns.The lawsuit stated that Wells Fargo was allegedly the only major lender in 2020 to reject more refinancing applications from Black homeowners than it approved.Airbus, Air France Hit With Manslaughter Charges Over Pilot Training Failures in Deadly 2009 Flight 447 Crash 1A Paris appeals court delivered a dramatic verdict in one of the longest-running and most complex legal sagas in aviation history. The court overturned a 2023 acquittal and found both Airbus and Air France guilty of corporate manslaughter for the tragic 2009 crash of Flight AF447.The ruling marks a massive victory for the victims' families after a 17-year legal battle. A lower court had previously cleared the European planemaker and the French airline in 2023, ruling that while errors were made, a direct causal link to the crash couldn't be proven. The appeals court completely rejected that logic, declaring the companies "solely and entirely responsible" for the disaster.Ride-Share Drivers in Massachusetts Formally Unionize 100The App Drivers Union said it was the first organization in the country to be formally certified to represent drivers for apps such as Uber and Lyft.In a news release, the organization, the App Drivers Union, said it would represent nearly 70,000 workers in Massachusetts who now have the power to collectively bargain.MATTA very special “who do we blame for SpaceX IPO governance” gameFirst, some S-1 highlights:“Starlink internet is what's being used to pay for humanity getting to Mars.” - MuskTranslation: We don't care much about Starlink, it's just paying our AI billsHe's not kidding: $3.2bn revenue for Starlink, net income of $1.2m$0.6bn revenue for rocket ship, net income of -$0.6bn$0.8bn revenue for AI, net income of -$2.5bnThis isn't a space company - it's classic Musk - you buy the vision (“To build the systems and technologies necessary to make life multiplanetary, to understand the true nature of the universe, and to extend the light of consciousness to the stars.”), but what you're really buying is an internet company that spends all its money on AI and does some rockets on the sideLet someone else invent the car (Tesla) and make them sexy with “big visions” for “humanity”Let someone else invent the rockets, build new ones using someone else's moneyLet someone else invent the satellites, put a whole bunch in space (and buy more satellites from someone else)Musk initially took the role of “Chief Engineer”, but every engineering task seems to have been the other employees - he supplied the moneyShoehorned AI into space exploration because…?Grok is designed as a truth-seeking AI model, built on our founder Elon Musk's mission to enable humanity to understand the universe. We believe that accomplishing this mission requires a truth-seeking approach to AI. We define truth seeking as the active, relentless pursuit of what is objectively true about reality, and grounded in evidence, logic, empirical data, and first principles thinking.AI's ability to revolutionize human potential is directly dependent on meeting exponentially increasing resource demands.We now must go to space to get more resources for AI so we can get to spaceNow the governance who do you blame gameMusk will get:85% voting power (dual class, he owns 94% of Class B 10 vote shares and 12% of Class A shares)The ability to nominate and vote exclusively on >50% of the boardA board which currently includes..TWO execs - Gwynne Shotwell (President) and Musk (three titles)Tesla mafia: Ira Ehreinpreis, Tesla board sycophant, director at the Boring Company and xAI, and longtime Musk hanger on, added Feb 2026Antonio Gracias, ex Tesla director who was explicitly called out in the Tornetta decision as corrupted, cross party transactions with Musk, on boards of Neuralink and Boring Company, added Oct 2010TWO VC bros from DFJ - Randy Glein (SpaceX board observer for 16 years, directors since Feb 2026) and Steve Jurvestson (former Tesla director, director since March 2009) who was ousted from the VC firm with his name on it for sexual harassmentPaypal mafia:Luke Nosek, co founder of PayPal, one of the founders of Founders Fund with Thiel and Ken Howery, invested in DeepMind, director since July 2008Donald Harrison - managed Google purchase of DeepMind, relationship with Nosek, director since Feb 2015Director relationship tenures to Musk: Shotwell: 24 yearsEhreinpreis: 21 yearsGracias: 21 yearsJurvetson: 17 yearsGlein: 16 yearsNosek: 26 yearsHarrison: 11 years (+1 if Nosek/Deepmind connection counts)Texas jurisdiction exclusively (judge shopped) - 3% to sue them, mandatory arbitration, anti-takeover statutes, special meetings ONLY CALLED BY MUSK (no one less than 50% of stock can call a meeting or vote)No written consent - no prior noticeAdvance notice bylaws for the zero shareholder proposals allowedFull omission of board liability - including a provision that automatically allows whatever the conflicts of interest they want with directorsWHO (WHEN) DO YOU BLAME?The US GovernmentDepartment of Energy - in 2010, the DoE gave Tesla a $465m loan, which basically paid for the Model S and helped it buy a factory 6 months before it went public - Musk has said Tesla would not have survived without the loanNevada - in 2014, Nevada gave Musk $1.3bn to build a factory, the most everNASA - spent more than $15bn over years on SpaceX and programs with themThe IRS/Congress - the EV tax credit for $7,500 single handedly pushed Tesla from losing money in 2020 to making money (they effectively got $1.6bn from the US government in 2020), and showing its first profit, which sparked the memefest during COVID and made Musk the richest man on earth - Musk then went on and called for an end to the tax credit since his “competitors” needed it more than Tesla. Tesla made ~$11bn from tax credits aloneThe DoD - started paying SpaceX in 2003 for concept work - and even when the rockets didn't work, the DoD and NASA awarded the company massive contracts anywayJeff Bezos said in 2016 that, “Elon's real superpower is getting government money.”FOMOSpaceX LOSES MONEY - it does not make moneyIf it were a satellite internet company - and NOT THE FIRST - the first was HughesNet in 1996, and Viasat offered it in 2012 - it would make money ($1.2m in income!)Instead, investors are valuing SpaceX as THE LARGEST IPO IN THE HISTORY OF EVER despite the fact that they are burning money on AI, and arguably the worst AIIncluding spending the most on R&D, marketing, and acquisition of Cursor to make up for the fact that Grok suckedIn exchange for FOMO, investors have ENTIRELY GIVEN UP THEIR RIGHTSIt is 100% a private companyTornettaIf Tornetta hadn't sued for Musk's pay, would SpaceX be structured this way?The banks underwriting the dealWho AGREED TO BUY GROK as a term of getting the underwriting, because everyone bends the knee to moneyThe boardI guess
AI Valley examines the "innovator's dilemma," where tech giants like Google hesitate to release advanced AI that might cannibalize their lucrative search advertising profits. This "bigness" often slows innovation, leading geniuses like Mustafa Suleyman to leave DeepMind at Google to found independent ventures like Inflection. However, the staggering cost of GPUs and computing power often pulls these startups back into the orbit of trillion-dollar corporations. For example, Suleyman eventually moved Inflection to Microsoft to leverage their near-bottomless cash reserves. This dynamic ensures that only the wealthiest companies with massive reach can truly compete in the expensive race for generative AI supremacy. (5/8)1905 LA
Keach Hagey recounts the January 2016 founding of OpenAI in San Francisco, initially established as a modest nonprofit research lab in Greg Brockman's apartment. Co-founded by Sam Altman, Brockman, and chief scientist Ilya Sutskever, the organization aimed to develop artificial general intelligence (AGI) safely outside of profit motives. Major initial backers included Elon Musk and Peter Thiel, who sought to create a counterweight to Google's DeepMind. The discussion explains how neural networks utilize Nvidia's GPUs—originally designed for video games—to mimic human thought, forming the technical foundation for the current AI race. (1/4)MARCH 1959
How is AI transforming accessibility for indie authors — and why should you care even if you consider yourself able-bodied? What happens when the tools designed to help people with disabilities end up making everyone's creative business better? Jeff Adams, accessibility expert and romance author, explores how AI is opening doors that were previously closed. In the intro, Spotify Audiobook Innovations; The Economics of Convention Life [The Indy Author]; Friction in your Author Business [Self-Publishing with ALLi]. Today's show is sponsored by Draft2Digital, self-publishing with support, where you can get free formatting, free distribution to multiple stores, and a host of other benefits. Just go to www.draft2digital.com to get started. This show is also supported by my Patrons. Join my Community at Patreon.com/thecreativepenn Jeff Adams is the author of YA thrillers and gay romance, and the co-author of Content for Everyone, a practical guide for creative entrepreneurs to produce accessible and usable web content. You can listen above or on your favorite podcast app or read the notes and links below. Here are the highlights and the full transcript is below. Show Notes How ending a long-running podcast made space for more writing — and how to know when it's time to let go of a good thing What accessibility really means for indie authors and why your digital content might be excluding part of your audience How AI agents like Claude Cowork are removing physical and cognitive barriers for authors with disabilities, chronic pain, or limited energy The culture of shame around AI use in the writing community and why blanket anti-AI statements can be ableist Practical tools including NotebookLM, ElevenReader, and ChatGPT for marketing copy, metadata management, and multimodal research Exciting futures in personalised reading, real-time translation, and AI browser agents that could change how everyone interacts online You can find Jeff at JeffAdamsWrites.com. Jeff also now has a SubStack at contentforeveryone.substack.com Transcript of the interview with Jeff Adams Jo: Jeff Adams is the author of YA thrillers and gay romance, and the co-author of Content for Everyone, a practical guide for creative entrepreneurs to produce accessible and usable web content. Welcome back to the show, Jeff. Jeff: Thanks so much, Jo. It's good to be back. Jo: It is. You were last on the show in March 2023, so over three years ago now. Give us a bit of an update on your writing and publishing business and what it looks like at the moment. Jeff: Sure. I think the biggest thing that happened is that my husband Will, who is also a writer, we ended the Big Gay Fiction Podcast at the end of 2024, after 470-something episodes. It was basically time to do that. So we both focused on writing from that point. In 2025 we had some of our biggest successes in getting writing out into the world. I refound my groove—my difficulty in writing went away finally. We talked a little bit about that back in 2023 too. Will started a new pen name and started producing again, and it was really good to be able to move in that direction. Jo: Was this the hockey romance that really hit at the right time? Jeff: You know, I wish I could have capitalised more on Heated Rivalry when it came out, but I did get hockey books out, and I think I did get to ride that wave a little bit there too. Jo: Yes, and if people don't know about that, that was a super popular streaming series. Was that based on a book? Jeff: It was, yes. Rachel Reid was the author of that book and that series that then Jacob Tierney optioned and made into what fairly turned into a global phenomenon at the end of 2025. Jo: Yes, absolutely. Although I particularly liked Red, White and Royal Blue. That was the one I liked. Not so much into hockey. But anyway, I just wanted to ask you about the Big Gay Fiction Podcast. As you say, you did hundreds of episodes over many years. You and I met over podcasting. You've had lots of connections with people. You ended it, and I know you struggled with ending it, but it sounds like it went really well for you. So maybe you could talk a bit about— How do you know when it's time to end something—a good thing rather than something bad? Does that make more space for writing, essentially? Jeff: It absolutely did make more space for writing for both of us, in particular for me because I have a day job. I balance everything on the creative side with the day job. Will and I had been talking about it for over a year. It just was like, it's really time. After nine years, getting to that 470 mark, we thought about trying to get to 10 years and we thought about, if not 10, then getting to 500 and ending on a milestone. As we looked at everything in our creative business, it was like, this is fun, we enjoy it, but we're not getting as much out of it as we might be if we were actually also writing books, which we also really want to do. It became a time thing and what was the best use of the time. We absolutely miss it occasionally. The whole Heated Rivalry thing, I would've loved to have had episodes to talk about that on, but in the long run, it was worth it. Jo: I mean, one of the things with a podcast, particularly around fiction, was that it was a marketing angle for your fiction. This show is a marketing angle mainly for my nonfiction. So what did you replace the podcast with, in terms of book marketing? Jeff: It was really stepped-up email marketing. I'd always had a list. Will started a list, of course, as he started his new pen name. So it was really turning on that, focusing on that, getting some email marketing with a Bargain Booksy and a Fussy Librarian and a BookBub occasionally to do that work. To be honest, even though we covered things in our genre that if you like what we're talking about, you should like our books, there was never as much of a connection there as you'd want there to be. Even from that book marketing angle, these other things that we can do, it's also a better spend of the money to get those types of promos than it was to continue running the show. Jo: Yes, that is interesting. I mean, obviously I think about podcasting a lot since I have this one, and I put Books and Travel on a hiatus and that was meant to help my fiction and definitely didn't help my fiction sales. But I want to bring it back again because I love doing it. Do you have this hankering sometimes? Do you think you'd ever do the podcast again? Because you are also quite into all the technical stuff and all that. Jeff: It's possible. I've toyed with the idea of doing a short accessibility podcast geared towards creatives, tilting to the same audience that Content for Everyone does. Then I come back and look at the time—is my time better served writing new fiction or perhaps starting a Substack, which I also toy with the idea of, for accessibility stuff? So it bounces around in my head to do another show, but I haven't really decided to jump on that yet. Jo: Yes, and I think that waiting is really good. As you say, you quit a big thing and you don't have to rush to fill it again. I love that you guys are writing more books. So I wanted us to talk about that up front because I know people who listen to this show—I encourage people to start podcasts if you want to, but equally it can take a lot of time. So that's fantastic. Now, you mentioned accessibility, and I feel like the word can be quite difficult for people. So let's just start with a definition. What is accessibility? Why do you care and why should we care? Jeff: So accessibility is really about making sure that whatever the thing is, whether it's something out in the physical world or in the online world, that everybody has access to it. Access to the information, access to getting into a building or being able to cross the street appropriately, whatever that is—that the accessibility of the thing is high. So that regardless of who is approaching it, they can interact with whatever the thing is. If we put that into the digital world, it's about making sure that text on a screen can be perceived by anybody, whether they're trying to read it visually or if they're trying to read it through a screen reader or through a braille monitor. Whatever that is, they need to be able to interact with it, get the information they need, do all the functions of whatever it is on the screen. Check out on Amazon, check out at their favourite e-commerce place, be able to get the products in their cart, check out, et cetera. For creatives, it's about the things that we do: the websites that we build for ourselves, the e-commerce platforms that we use, our email marketing, our social media posts. Making all of that as accessible as we can so that we're not perhaps missing a part of our audience or our prospective audience from being able to engage with our work and in turn, hopefully, buy our books and enjoy our books and become a fan. This became important to me because of my day job. I hadn't really considered this—like, I think most people don't—until I started working at UsableNet. It's going to be 15 years I've been at that company come this autumn, and I really started to see the impacts because UsableNet is all about accessibility on the digital front. I really started to learn, being a project manager for them, what all of that meant and how it impacted people who couldn't buy something online, couldn't book a hotel room, couldn't book an airline ticket. It just really became something I got passionate about. I ended up writing the book because I realised that nobody talks to creatives about this. Nobody tells the independent author what they should do to help make their digital stuff accessible so that they don't miss people. I never expected my day job to interact with my creative side so much, but this certainly has over the last few years. Jo: I mean, has it got better? Like we said, you were on here three years ago. We did talk about some of the things around EPUB formats and taking off DRM and what we need to do on our websites—labelling images, for example, and that kind of thing. Do you think accessibility has gotten better? Jeff: I think the awareness of it has improved, both within the creative community and in the broader web ecosphere, that the awareness is better. There's so much knowledge that needs to go into creating something that is accessible. Sometimes there's so much that you have to think about with colours and alt tags on images and all the little bits and pieces, if it doesn't really come to muscle memory, it's easy for it to fall off. There's a survey that's done by WebAIM every year about the top one million homepages out in the universe, and they surveyed those for just the things that an automated scan can detect, which is a small portion of overall accessibility, and the number of errors across that top million actually ticked up this year. Even though there's all these laws around the world—people get sued all the time in the US—the number of errors ticked up for the first time in a few years. So I think the awareness is up, but I think being able to take action on it and make the time to take action on it isn't where it needs to be. Jo: So last time you gave us all those tips. I'll refer people back to that and also to your book Content for Everyone, which has got loads of great stuff in. I wanted to talk to you for this show because I was sitting watching Claude Cowork—now I use Claude Code a lot more—but updating 140 titles on IngramSpark, where me clicking things and there's like 15 clicks per record on IngramSpark updates for pricing, is an absolute nightmare. I was watching the AI do the work and I realised this isn't just saving me time, it's actually saving my wrist and my arm from repetitive strain injury. That's when I thought about this accessibility thing. As you mentioned, for example being physically accessible into a building, say someone's in a wheelchair, they can't necessarily get into a building if there's no ramp. I was thinking that for many years, being an indie author, being a writer online, there's also been these physical barriers because there's a lot of plumbing and clicking for us. So I wondered, starting with an attitude around a shift in who this is opening up to— How is AI starting to help people with these accessibility issues? Jeff: Yes, there's so much opportunity around this. We should note, just to timestamp this, that we're talking on 14th April 2026, because who knows what will change, even in an hour from now. I think Cowork was one of the first things that we saw, and that's only been out since the very top of this year. Being able to do actual agentic tasks. Other things have sort of gotten there, but Cowork really opened it up. You mentioned the repetitive stress that you would've had clicking all of those forms on IngramSpark across 140 books. But there's that type of stress, chronic pain, cognitive drain for somebody who may have some cognitive disability and trying to work through that form. The cognitive energy just might drain out and maybe knock them out for several days after trying to get through that, or the tasks take them multiple days to do. Someone who has lower vision, someone who's trying to work through that form with a screen reader—all of that draws energy, draws focus. Now we've got something where, with plain language, we could say something like: here's all my pricing information, I've logged into IngramSpark, go update these books. Obviously the prompt's going to be a little more than that, but in broad terms, that's what we're going to tell it. Jo: Hmm. Jeff: And being able to have it go through and do the thing. If it gets stuck, have it come back and say, “Hey, I've got trouble with this. Please help me.” That can just free up so much of the drains that people can have—the things that can take them out of doing the part of the work that they need to do for an author business. They can go write the book through whatever process you're going to use to do that, rather than getting caught up in something like having to update all those books on IngramSpark. Jo: You mentioned writing the book there. I have this real sense of being an able-bodied indie author in terms of my computer use and my ability to write a whole book, a 70,000-word thriller that I write regularly. We're all special in some way, but I do have a reasonably normal brain where I can do this work without too much strain. It's hard work, but I can do it. I meet people who are now using AI to help them write, to help them organise their work—maybe someone has dyslexia or ADHD or cognitive issues or pain—there's just so many things that I take for granted that don't affect me. I hear from people who, at this point in time in the community, are almost shamed for using AI to write. So I wanted to bring this up to discuss it under the terms of accessibility. Do you have any thoughts on that? Jeff: I have real difficulty with people who will say anything in the broad range of, “I don't need to use this thing, and therefore you should not either.” Which is adjacent to indie anti-AI speak that there is out there. Certainly we're living right now at probably the highest point that it's ever been, where more and more there's a sentiment towards not using AI for whatever the reason is. I totally respect that people can have concerns about the environment and about energy use and water use, et cetera. Not to mention all the other things that are on the more difficult side of AI. To shame someone who may not be able to put their story out there without the use of that AI, whichever one they're using, or to shame them because they're using AI to run part of their business—updating IngramSpark, doing other things like that—I think it can come down to there being some ableism there. Ther is some privilege behind that too, where they're just like, “I don't need this, and you shouldn't have it either.” I want to give people just a sliver of an idea of what this can mean for someone who is disabled and what AI can unlock for them. There is a person on LinkedIn that I follow whose name is Hannah Desmond. She's an ADHD coach and a former software developer, and very recently she posted this on LinkedIn. This is a paraphrase of what she said, but: having something that can meet you where you are and help you bridge that gap is what I think I have found so helpful about using AI. Here's what I keep coming back to. Without that support, I wasn't more motivated or more capable. I was just stuck. That's the bit that gets lost. We've been taught that struggling is how you know you're doing it properly. So when something reduces the struggle, it can feel wrong—even when it's the thing that actually makes the work possible. Because there's a difference between avoiding thinking and being able to think at all. I think that rounds it up. She's talking about her time as a software developer, but you can apply that to any realm of AI when we're thinking about trying to shame someone for why they may be using it. We may not know that they have a disability because we don't always share that part of ourselves. So I really feel strongly about that and how we are in this culture of shame. Jo: Yes. It drives me up the wall, actually. But I will also say: you don't have to have a disability or accessibility issues in order to use AI in whatever way you personally decide is okay—talking to the listeners now. I think Orna Ross from the Alliance of Independent Authors says it well, which is you should have your own AI policy. So you personally decide where your lines are, how it helps you, what you want to keep for you, and what you want help with. I was also thinking in terms of accessibility around money. Again, for many of us, professional cover design, professional editing, professional human-level translation, these are things that are pretty pricey for many people. So again, this makes it more accessible. One of the reasons we got into the indie way and being indie authors was to try and remove the barriers to entry to people who have been excluded from the environment of publishing. So, yes, it is really hard to talk about this, and yet that's why I wanted to talk about it, because— There's so many variables for each individual and there's no situation that's the same, really, is there? Jeff: No, not at all. The things that I may need to do my work in the most efficient way possible is different from the way that you're going to work, is different than the way my husband's going to work, is different than every other person and the way that they're going to work. Which is why any kind of blanket statement about “I don't need something and therefore you shouldn't need it either” can just be so problematic, because we have no idea what someone else is going through. Either it's a permanent part of their lives or maybe it's something that is happening temporarily with them where they might need to leverage other tools. Jo: Yes. Talking about that temporary, I think I really got the first sense of this when I had COVID the first time, which was really bad. I remember I was so sick, the only thing I could do was listen to an audiobook. I couldn't think, I couldn't read. It was really probably months of not having my brain back. Then the other thing that's happened as I age, as women age, is menopause kicks in and the brain fog is a real thing. I've heard from other people too who've said having Claude or whoever, an AI tool, to help with the brain fog is so important because otherwise I just wouldn't be able to gather my thoughts. Again, as you said— Even if we don't need these things now, it's quite likely we're going to need them at some point, given ageing, given the potential for injury and disease. I mean, we don't escape this alive, do we? Jeff: Yes, that's a great point because unless we're extremely lucky as individuals, we're all likely to have some sort of a disability in our lives at some point. I know for me, as I age and my eyes get more and more tired after being in front of a screen all day for work, and then whatever creative stuff I do in the afternoon on a book—when it comes near bedtime and I do want to read, I probably want to do that with an audiobook, much more audio, especially for any long reading project. That can also be like, if I have a long document or a long article to read, I am likely to give it to ElevenReader, let it load itself up, and then listen to it, because I take the information in better than trying to follow words across a screen. Jo: Yes. Jonathan, my husband, now also listens to a lot of academic papers on ElevenReader. Most of us will know it as where we publish some audiobooks from ElevenLabs, or you can also publish other things there. So it is super useful to think about what we can do with ElevenReader. Another thing that I found really useful recently is NotebookLM. On NotebookLM, there is a free tier. You can put various things in there and then create a custom audio. So this is something I've been doing as part of research. You can put in, say, 10 YouTube videos or some PDFs or your book or whatever, and then you can create a custom audio. Then I'll go for a walk and I'll listen to the custom audio, and then I'll go back and look at the detail of what it was. It gives me the framework of whatever I'm thinking about on a broader level, and then I can come back to the details. So again, it's this multimodal approach that can help us manage our energy, I guess. Jeff: And it's all about the managing of the energy, I think, too. That is a great way to think about the accessibility of it all. You mentioned a great use there for NotebookLM. That could also be putting your book in there and having it help you build a world bible or something like that. Or building marketing materials off of that. There's a lot of things now that NotebookLM can do in terms of helping you create FAQs maybe for a newsletter or for your website, and building video stuff off of the material that it has. So there's a lot of options there, and ever-growing options that can be useful for someone to manage any number of the things that they may need in their creative business. Jo: Yes. In fact, talking about Claude, there are a lot of Claude plugins now, skills and integrations. Shopify just released a Claude plugin and many of us now have Shopify stores. I have a lot of products with a lot of different variations and the metadata. There's so much metadata. And again, I'm just so pleased now that I can work with Cowork and get it to actually update directly into Shopify. In fact, coming back, you mentioned updating alt tags earlier. That's something again that AI could help you update—the back list of your alt tags on a website. I've now got my Cowork doing EPUBs so I could finally update all my EPUBs with back matter and all of this kind of thing. So I feel like perhaps we could go beyond accessibility to talk about amplification. All the things that we didn't do because it was too tiring and we just couldn't be bothered, or it would just be way too much work, that now it's opened up as a possibility because of these tools. Jeff: Absolutely. I mean, you look at a backlist as large as yours and the things that you're now able to do. I didn't know that Claude had a Shopify plugin. So the abilities that we have now to maybe do things in the business that we hadn't before. One of the things I've been working with Claude on is rewriting my website and creating a more proper website for Will. I'm really making sure that it is not only SEO prepared but also GEO prepared, with all the metadata and all the backend code schema that it needs so that LLMs can find me, can understand what I do, can understand the books, branch out to the other areas that it needs to. Doing that through WordPress would've been so much more difficult, even with Claude, that to be able to rewrite the site in a way that is going to let me manage it better so that I will do it on a more consistent basis. Whatever that thing is, we're now able to do these things. That could be updating keywords in Amazon or making sure we're aligned across all of the sales platforms that we might be on and things like that, that Claude can do and do well. Jo: Yes, I think marketing is just the killer app really for people, isn't it? I think most authors do not enjoy marketing. I find Claude better for creative work, for strategic work, for doing work through Cowork or Code, but— ChatGPT with marketing copy is very, very good. So I've actually been using that as we record this. I've got a Kickstarter launching next week, so I've been getting it to do ad copy and social media copy and all that kind of thing. This is stuff when you have to produce—give me 20 taglines, give me 20 hooks, give me another 20 and another 20. I mean, we just cannot do it as humans, right? Jeff: Yes, I have found GPT wildly helpful. I mentioned trying to get Bargain Booksy and Fussy Librarian promos. Jo: Mm. Jeff: And you have to give it the marketing hook, and it can't just be the blurb that's on Amazon—it's got to be something fresh, and they each have slightly different requirements. Having GPT—here's the blurb, give me a dozen different options—and then I may take pieces of all of them and create one of my own. But it reworks that much faster than my brain was ever going to try to find the right thing I want to give to Bargain Booksy. Jo: Yes, you are right. Or it says write this in 300 characters or less. Jeff: Yes. Jo: I do exactly the same. That kind of transformative work can be really good. In fact, there was somebody I know who has been rampantly anti-AI for years and then said, “Would this help me? I have to do a synopsis for an agent, so I've got this 100,000-word book and it needs to be a 10-page synopsis. How would I do that with AI?” So I was encouraging her to take each chapter and ask it to summarise the chapter, and of course read through it and everything. But I mean, doing a synopsis once you've actually written a book—that can be super useful. So I think what we're saying is— There are levels of need in terms of both the author and the audience. Then there are levels of your personal use from one end of the spectrum to the other in terms of how far you want to go in every area of the business. And in that way, it's just different for everyone. Jeff: Yes, and I think getting to that mindset shift that we were talking about a little bit—it can be so easy to dip your toes in. That one author came to you and said, “Do you think it could do this?” And I think that's the beginning exploratory area for perhaps anyone. People are going to hear us talk about this and it might inspire them to go try something that we've talked about. But these things, whether it's Claude or GPT or Gemini or whichever one it is, you can come to it and say, “I'm an author, I have X, Y, Z going on in my life”—whether that's a disability, whether that's a time constraint because you have a day job and maybe you have kids and a family that need your attention—”I have these time constraints, I want to do X, Y, and Z in my business. How can you help me with that?” It's going to tell you what it can do to help you with that. I would even say, if you have the ability to have multiples of these, you could ask the same question to GPT and Claude, and they're going to give you similar answers in some instances, but they may also have different ones because of the abilities that the different platforms have around these things as well. That can help you make that mindset shift of, “Well, now I see that it can do that. Could it also do this?” And then ask it if it could do that. Because I know for me, Jo, I've taken so much from you and your journey with Cowork that it's like, “Oh, she did that. I wonder if I could do this.” And all of that piles on top of itself. Then eventually I think your brain starts to think on its own, “Oh, I have to do this task. Can Claude maybe do this for me? Let's go find out.” Jo: Yes, and if it couldn't do it for you yesterday, you never know, it might be able to do it tomorrow. Jeff: Right? Because I haven't tested yet its new ability to actually use your computer. Jo: Mm. Jeff: And I'm curious what that might open up. Because one of the things that I've seen that I wish it would do is be able to take the EPUB that's on my drive and actually put it into a platform I'm trying to upload to. Cowork on its own hasn't been able to cross that barrier, but I wonder if with computer use added to that, if it could. Like, “here's the EPUB, upload that over there,” be able to pick it from the file picker, essentially. Jo: Yes. I think, well, a little tip for everyone: I wouldn't give access to your entire file system to the AI. Jeff: That's a good point too. Jo: Yes. I have a Claude folder in my drive and it only has access there. So if you put files in that drive, it might be able to do that. But I know what you mean. I have been using it to help me publish things in German on KDP. Now I can use the browser, so you can actually do that. In terms of uploading the actual file, I know what you mean. These things will change. As we record this, again middle of April, we are almost about to get the next models being Mythos, which might be Claude 4.7 Opus, or also ChatGPT has a new model coming, and these models are getting very powerful. With every shift they can do more things. So as you say, the very first thing to do is ask it, “I want to do this—what are my options?” And some of them, for example, doing an AI-narrated audiobook, ChatGPT and Claude don't do that. You want ElevenLabs or one of the other services for that, but they can tell you what your options are. So that's one thing, but I wondered if you have any thoughts on the gaps that you are seeing. You mentioned one there around file uploads, but— What do you hope might come and some of the things that might be exciting if they arrive? Because you never know, they might be here already. Jeff: There's certainly some movement in some areas. One of the things I'll share is, in March I was at the 2026 CSUN Assistive Technology Conference—CSUN is California State University, Northridge—and they've run this conference for some 40 years now. One of the sessions I went to was from Tara Maisel—I hope I'm pronouncing her last name right. She's a senior project manager in books accessibility at Amazon, and she was doing a session specifically on readability. She had all kinds of statistics and information about what goes into making something readable. One of the things she talked about with AI was the future of personalised reading. If you think about the Kindle app, for example, there's a lot of settings you can make there—font size, colours, brightness, text spacing. There's a lot of tools in there. She was pointing out that potentially readers don't even know what they actually need for the optimised visual reading experience. She sees a world where AI can perhaps do an analysis of your reading behaviour and then help you find the optimal settings. Maybe even multiple optimal settings for, say, if you were reading in a room that had daylight versus at bedtime, and the ways you might shift it. I was almost thinking of this like when you're at the optometrist and they're like, “Which lens is better—this one or that one?” Jo: Oh, sometimes that is very hard. Jeff: Yes. It's that AI could step you through that a little bit to help you find that optimal reading experience in that moment. And then it might even notice, potentially, if you're changing something in the way that you're moving through a page, that it might flag to say, “Hey, do we need to adjust something?” Some other areas that I think are really exciting, for everyone and perhaps particularly for people who are disabled and needing the support of some assistive technology, is what we're seeing in the browsers. OpenAI's Operator has been out for quite a while now, since sometime I think autumn of last year. Perplexity Comet has been around even longer. Then we've got browser extensions from Gemini and Claude that are available, that can let you just type natural language. You know, “Please go find for me jeans in this size that are on sale on this website. Find me the best price for blue jeans on this site and this size,” and it'll just go do it. Which can certainly speed things up for people in the disabled community to find things quickly, to spend time navigating less, and maybe ending up with the AI coming back and saying, “I found these five things. Which one would you like me to buy for you?” Or, “I found this one thing that you do need and it's waiting for you in your shopping cart.” The ability for that on the horizon is an amazing jump from an accessibility point of view. But really it's one of those things that accessibility will then help everyone because we can all just shop that way, if we choose to. These are early days for these browsers and these extensions. The other side of it comes back to basic web accessibility too, because I've seen these types of activities not work so well on a site that may not actually be accessible on its own. A great example is something I ran into with Claude Cowork about a month ago. I was testing to see if it could help me navigate and get things uploaded together for a site where I wanted to upload books, knowing again that it's not going to upload the actual file, but it could fill in the metadata from my master database of metadata stuff. There were areas on the site that it actually couldn't hit the button, because the site itself was also not functional to a screen reader. So there are gaps there. It's early days, but I really see that as an interesting future that'll really help people with disabilities—but again, help everybody too, just manage time better. Jo: I know exactly what you mean there. I've done some collaborative work with Claude Code when it's like, “I can't click the button,” and I'm like, well, I'll click the button—you fill in everything else. Jeff: Exactly. Jo: It's actually quite a funny situation. But goodness, coming back to IngramSpark again—these things need APIs. We need better functions. It's funny because I think a lot of traditional publishers have these APIs or backend upload things that you can do. I'm like, well, we need to get to that with these systems. But I think things will change. Another thing that I think has also shifted is the use of voice. Voice for dictation—it used to be with dictation that you would have to say “comma,” “open quote,” “new line,” and all of that. And you'd also have to make sense. Whereas now I feel like you can just dictate a whole load of things to these AIs and then say, “Tidy that up,” and they will do a lot more than the old situation. So I think voice will also help. Also automatic translation. I don't know if you know this about X, and if you're on X anymore, but just this week they've made it multi-language. So I can read tweets by people who've posted in another language in English. I can read something from Korean or read something that someone French has posted and it gets translated. It has made a huge difference to the content I'm seeing, which is fascinating because I don't think we've ever had this kind of automatic “everything is translated into your language” situation. It's really got me thinking about how [automatic translation] might work for eBooks or other things if the rights are there. I don't know. Have you seen stuff like that? Jeff: There's so much available now with voice and the ability to not have to speak all the other stuff that went with it—comma, full stop, next line. It was a little mind-bending sometimes, trying to think about quote marks and all that stuff. And now it's so good. Different platforms do it to different degrees of ability. Even being able to speak your prompts into the very platforms themselves without having to type all of it. Chronic pain comes to mind, any kind of mobility thing—all the typing would be a drain or maybe even impossible. So the voice ability is so powerful there and unlocks more things. At the same time, those translation abilities—I believe AirPods now have the ability, if you've got the right stuff on your phone, that you could be talking to somebody, they may speak back to you in a language you don't speak, but your AirPods will give it to you in your language. Jo: Hmm. Jeff: Google has, I believe, a live captioning app that you can use. I think there's even a split screen—I don't know if that's available now or something in their future—where you could put the phone on the table and tell it who's looking at what side of the screen, and it'll put the language that I need on my side and the language the other person needs on the other. So there continues to be such a shift in how we're being able to translate stuff that really opens up communication and can open up our books to so many more people. I'm very interested to see—I haven't pulled the trigger on this yet—but how Amazon's auto-translation rolls out and how that's received in terms of the accessibility around our books and being able to put it in someone's hands who doesn't speak—I think it's only English to other languages right now—but who doesn't speak the language it was written in but wants to read that book. We could never, as indies, or really even big five publishers, wouldn't have the money to create custom translations everywhere. But if the AI can help do that and spread those books around so that everybody could have the story they want to read, I think that's such a win for the reading audience. Jo: Yes, I think it's so exciting to think what might be coming, and that's what I want to stay on the side of on the AI discussion. There's enough negativity out there and you can get that information somewhere else, but for me I want us to stay on the positive side of how this helps both the author and the reader. And hopefully the community, to create more and read more and enjoy being human more. Right? Because I find that I do get out more and listen to stuff, or I'm out walking instead of at my desk, and I mean, that's what it's about. I'm pretty excited about the future. How about you? Jeff: I am. I think there are, quite honestly, some scary things that could be out there in the future. I mean, there's been a lot of talk about what Mythos is capable of. But on the other side of it, there are all these advances. I also look back at Google and AlphaFold and what DeepMind was able to do there for science. There's more of that stuff out there, and individually for each of us, spending a little bit of time—and I do have to say, I think you need to spend time on a paid plan because the free stuff doesn't give you the idea of what these platforms are actually capable of. So if you only drop in, even briefly, to experiment on one of the $20-a-month plans and give it your situation, ask it what it can do for you, I think you'll see where, on a personal level, AI will help you unlock some things. It can help you move some things to the next level in your business that for whatever reason you haven't been able to do. You don't have to use it for everything. You may decide that it's still not for you for whatever reason, and that's fine. But I think there's so much to explore here and to let your curiosity run for a little bit to see what's possible and what you might unlock with it. Jo: Brilliant. So where can people find you and your books and everything you do online? Jeff: So pretty much everything lives at JeffAdamsWrites.com. Jo: Well, thanks so much for your time, Jeff. That was great. Jeff: I loved it, Jo. Thanks for having me..The post Accessibility And AI: How New Tools Are Opening Doors For Indie Authors With Jeff Adams first appeared on The Creative Penn.
Google dropped like 197 new AI features this week.
I sit down with Logan Kilpatrick from the Google DeepMind team, live at Google I/O, to unpack everything Google just announced and what it means for founders and builders. We cover Gemini 3.5 Flash, the new Gemini Omni world model, the expanded Antigravity ecosystem, managed agents in the Gemini API, and the native Android app builder inside AI Studio. Logan shares how distillation keeps pushing Pro-level intelligence into Flash, where the real opportunities sit for solo founders, and why the agentic era has finally crossed the chasm from demo to useful. If you have an idea and want to ship something this week, this episode maps the toolkit. Timestamps 00:00 – Intro 00:53 – Gemini 3.5 Flash: The New Workhorse Model 01:49 – How Flash 3.5 Stacks Up Against Sonnet 02:38 – Gemini Omni: A World Model for Any Input and Output 06:18 – Building a Content and Creator Layer on Omni 08:21 – What to look forward to 10:53 – Google Spark and Managed Agents 14:00 – The Agentic Era and Requests for Startups 17:17 – The Antigravity Ecosystem Overhaul 18:51 – AI Studio vs. Antigravity: Vibe Coding vs. Agentic Engineering 21:31 – Native Android Apps Built Inside AI Studio 23:44 – Closing Thoughts Key Points Gemini 3.5 Flash ships as a Sonnet-level workhorse model tuned for long-running agentic tasks, coding, and tool use, available on day one to 900M+ Gemini app users. Gemini Omni is a single model that takes any input and produces any output across video, image, audio, and music, fusing Veo, Nano Banana, Lyria, and TTS into one system. Managed agents in the Gemini API let builders ship agentic products with a single API call, using skills and markdown instead of writing orchestration code. The Antigravity suite now spans an IDE, agent manager, CLI, SDK, and API surface, all sharing the same agent harness that powers Gemini Spark. AI Studio targets vibe coding and now builds native Android apps for free, while Antigravity targets production-quality, million-line-codebase engineering. The cost of intelligence keeps dropping thanks to distillation, opening up smaller markets that previously needed a 40-person team and venture funding to address. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ FIND LOGAN ON SOCIAL X/Twitter: https://x.com/OfficialLoganK Youtube: https://www.youtube.com/@LoganKilpatrickYT LinkedIn: https://www.linkedin.com/in/logankilpatrick/
Oriol Vinyals, VP of Research at Google DeepMind and co-lead of the Gemini program, joins Jacob the day after Google I/O to unpack the research underpinning Google's latest announcements and where frontier AI is heading. The conversation moves from world models (why Google has uniquely bet on them as a path to AGI, what the "GPT moment" for video and images would look like, and how they connect to robotics and simulation) to agents (the Spark release, why the system and model need to be optimized jointly, and why scaffolding will eventually be written by models themselves). Oriol gets into the mechanics of memory in models, drawing on his cognitive neuroscience background to argue that file-system-style non-parametric memory is more practical than baking memory into weights at serving scale. He shares his views on the limits of RL today (LLMs are data-limited in a way that game-playing RL never was), why training on narrow domains like math and code generalizes surprisingly well, and what a true "Move 37" moment for science or ML research would look like. Throughout, he reflects on the unique advantages of being inside Google (TPU co-design, end-to-end revenue stability, the merger of Brain and DeepMind), the trade-offs between focus and exploration in research orgs, and why he believes AGI in some meaningful sense may already be here, even if the goalposts keep moving. (0:00) Intro (1:36) Why World Models (4:21) The GPT Moment for Video (7:51) What Makes Omni a World Model (10:04) World Models & Robotics (12:37) Evaluating Physics in AI (14:51) Consumer Agents & Spark (18:39) Scaffolding & the Bitter Lesson (22:06) Memory & Continual Learning (26:54) Research Bets Inside Big Labs (32:30) Post-Training RL is Greenfield (35:57) What Real Intelligence Looks Like (39:11) RL Generalization (43:00) Advice for Founders (46:40) Can AI Truly Innovate? (49:48) Recursive Self-Improvement (52:14) Quickfire With your host: @jacobeffron - Managing Director at Redpoint
Demis Hassabis, Co-Founder and CEO of Google DeepMind, refused to leave London, challenged Google on AI safety and helped lead DeepMind back into the AI race.Sebastian Mallaby, author of The Infinity Machine and The Power Law, joins Andreas Munk Holm to discuss the founder psychology of Demis, the story behind DeepMind and why Europe may be entering a new era in technology.The conversation explores DeepMind's fundraising journey, the Google acquisition, the merger with Google Brain, AI safety, sovereign technology and why Demis remains sceptical of parts of Silicon Valley culture despite operating at the centre of it.Timestamps(00:00) Why Demis Hassabis matters(01:12) Why DeepMind could not raise from European VCs(07:35) The Peter Thiel chess story(11:00) What DeepMind reveals about European venture(14:42) Why Europe's tech ecosystem is accelerating(18:20) European sovereignty, defence tech and AI(21:20) DeepMind's sale to Google and tensions over AI safety(29:40) The founder psychology of Demis(41:35) Google's ChatGPT moment and Gemini's comeback(45:05) Demis' critique of Silicon Valley(50:45) Europe's AI sovereignty problem(54:05) Final thoughts and Sebastian's new bookSubscribe to EUVC, the home of European tech, for more insights.
Logan Kilpatrick and Tulsee Doshi of Google DeepMind join for a first-ever in-person episode recorded just days before Google I/O, covering headline launches like Gemini 3.5 Flash, the Omni video generation model, and the new Gemini Spark agentic product. The conversation digs into Google's strategic decision to lead with cost-adjusted efficiency over raw capability, how DeepMind now ships a full agent harness rather than bare models, and technical questions around context window limits and knowledge cutoffs. They also explore how the team thinks about model psychology, AI welfare, and recursive self-improvement. Sponsors: Brave Search API: Brave Search API gives AI agents a fast, independent search index for research, RAG pipelines, images, places, and fewer hallucinations. Get $5 in free credits at https://brave.com/search/api/?mtm_campaign=q2-26-cognitive-revolution Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code COGNISM in the source field to save 20% off year one Roboflow: Roboflow is an end-to-end visual AI platform that lets you turn raw ideas into fully deployed applications in just hours, powering breakthroughs like Blueprint Pro's floor-plan understanding tool. Read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI at https://roboflow.com Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr
In the third episode in this series of conversations around The Devil Wears Prada, Julia speaks with Öznur about the relationship between excellence, fear, creativity, and psychological safety — and whether the kind of environment created by Miranda Priestly still has a place today.Reflecting on her early years working in London design agencies, Öznur describes environments where relentless pressure, perfectionism, and long hours were normalised in the pursuit of excellence. Like Miranda, many of the figures leading these spaces were deeply committed to producing exceptional work. But the conversation asks an important question: does commitment to excellence inevitably require harshness?Öznur speaks candidly about the impact of fear-based environments — how they create stress, mistrust, and competition within teams, while slowly eroding creativity and confidence. Rather than bringing out the best in people, she argues, these cultures often prevent talented individuals from contributing fully.A central theme in the episode is the importance of psychological safety. Öznur reflects on the kind of environments she has consciously tried to build throughout her own career — spaces where people feel trusted, supported, and able to speak up, including the quietest voices in the room.The conversation also explores the tension between empathy and standards. Julia and Öznur discuss the challenge of balancing care for people with clarity around performance and expectations — and why avoiding difficult conversations does not necessarily help teams thrive.Returning to Miranda Priestly, the episode reflects on how differently her behaviour might be viewed today. Was she simply a product of her time? Or have expectations around work, wellbeing, and leading fundamentally changed over the last twenty years?Together, Julia and Öznur explore a more sustainable vision of excellence — one built not on fear and control, but on trust, clarity, stability, and collective creativity.About the GuestÖznur is a design and research leader with 20 years of experience in the field, working in complex industries like healthcare, life sciences and technology.A champion of cross-disciplinary collaboration and psychological safety, she leads her teams and her organisations towards delivering better services to their users. She is also a trained horticulturist and a garden designer, and enjoys bringing gardening metaphors to her work as much as possible.Öznur is currently heading up design at Isomorphic Labs, and she previously held similar roles at Genomics England, DeepMind and Google.
Most conversations about agentic AI in healthcare get stuck on capability. This one is about the gap between capability and deployment — and what closes it. Aashima Gupta, Global Director of Healthcare Strategy and Solutions at Google Cloud, argues that healthcare's bottleneck isn't vision; it's courage. The processes are documented poorly or not at all, AI fluency programs reach a fraction of employees who want them, and most enterprises are running agents without the harnesses — grounding, evaluation, red-teaming — that production deployment actually requires. Meanwhile patients navigate three different "clock speeds" (annual insurance cycles, shifting provider rosters, Medicare pricing) that bear no relation to the timeline of their own health. We cover the European vs US deployment posture, the difference between agents-with-agency and rule-based AI, why Highmark's library of one million internal prompts matters, Google Cloud's full-stack efficiency play (TPU Ironwood, Gemini, the 40% data-centre electricity reduction DeepMind delivered years ago), and the multi-agent "harnesses" — including the red/blue/green team architecture — that are starting to make production-grade healthcare AI plausible. Video: https://youtu.be/rLtaxQLgCg0?si=JDP6kK97_tYsFoSb Newsletter: https://fodh.substack.com/ Agentic Patient Series: https://www.facesofdigitalhealth.com/agentic-patient-blog
Demis Hassabis is an artificial intelligence researcher, scientist, and entrepreneur. In 2010, he co-founded DeepMind, an AI research lab which is now part of Google. In 2024, Hassabis won a Nobel Prize for using AI to predict the 3D structure of proteins, critical for disease understanding and drug discovery. He was also awarded a knighthood that year by King Charles III.On April 20, 2026, Sir Demis Hassabis came to the Sydney Goldstein Theater in San Francisco to talk with author Sebastian Mallaby, who recently published a book about Hassabis's work, The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence. The two were interviewed on stage by journalist Emily Chang.
In this episode, Ray Cochrane breaks down a reversible conductive glue from Newcastle University that could replace solder and finally make electronics recycling work. Additional stories cover China widening its clean energy lead, DeepMind’s AlphaEvolve scoring wins from genomics to Google’s database, Anthropic’s $200 million partnership with the Gates Foundation, Intel teaming up with McLaren Racing, and end-to-end encrypted RCS rolling out in beta. – Want to start a podcast? Its easy to get started! Sign-up at Blubrry – Thinking of buying a Starlink? Use my link to support the show. Subscribe to the Newsletter. Email Ray if you want to get in touch! Like and Follow Geek News Central’s Facebook Page. Support my Show Sponsor: Best Godaddy Promo Codes Get 1Password Full Summary Cochrane opens the show with a deep dive into Newcastle University’s reversible conductive glue, a water-based adhesive that could finally make electronics recycling economically viable. He frames the e-waste problem first: 62 billion kilos a year, with less than a quarter ever recycled. Then he walks through the silver nanoparticle chemistry, the lead-free angle on traditional solder, and the geopolitical stakes of critical mineral recovery. From there the episode pivots through energy, AI, hardware, open source, data research, space, science, and consumer privacy. A Reversible Conductive Glue That Could Replace Solder A team at Newcastle University has developed a water-based glue that conducts electricity well enough to replace solder. Unlike solder, however, the glue releases cleanly with a quick rinse of acetone or an alkaline bath. The breakthrough relies on silver nanoparticles suspended in a water-based binder. Consequently, components can be recovered intact, opening a viable path to electronics recycling at scale. Co-investigator Volker Pickert framed the second prize directly: solder has the best conductivity, but the best formulations contain lead. China Widens Its Clean Energy Lead A new Atlas Public Policy report shows Chinese firms accounted for 55 percent of $1.1 trillion in global clean energy manufacturing investment between 2019 and 2025. Battery manufacturing alone pulled in nearly half of that money. Meanwhile, U.S. companies have actively retreated from those same industries. With the Strait of Hormuz currently closed, supply chain ownership in solar, wind, and batteries matters more than ever. A separate Ember analysis showed Chinese solar panel exports doubled in March alone. DeepMind’s AlphaEvolve Scores Real Wins DeepMind published an update on AlphaEvolve, its Gemini-powered AI coding agent. The system cut genomic variant detection errors by 30 percent. Additionally, it lifted AC Optimal Power Flow feasibility from 14 to over 88 percent on the electrical grid. AlphaEvolve also found a better cache replacement policy in two days that would have taken human engineers months. Furthermore, it reduced write amplification in Google’s Spanner database by 20 percent. The pattern shows applied AI sticking, not as a chatbot but as a quiet optimizer. Anthropic and Gates Foundation Commit $200 Million Anthropic announced a four-year, $200 million partnership with the Gates Foundation across three pillars. The biggest pillar targets global health and life sciences in low and middle-income countries. Notably, the research scope includes polio, HPV, and preeclampsia. A second pillar covers AI in education across the U.S., sub-Saharan Africa, and India, in partnership with the Global AI for Learning Alliance. Finally, an economic mobility pillar focuses on agricultural productivity and crop benchmarks. Google’s AI Educator Series Launches Free Google rolled out the first 20-plus sessions of its AI Educator Series this week. The free AI literacy training targets the roughly 6 million K-12 and higher education teachers across the U.S. Modules are designed as short, snackable trainings teachers can finish in a prep period or a lunch break. Additionally, stackable workshops let educators build credentials over time. Importantly, the program requires no institutional subscription. Amazon Bedrock Prompt Optimization Goes GA Amazon Bedrock dropped its Advanced Prompt Optimization tool, now generally available across most major regions. The feature rewrites prompts to perform better on specific models and automates prompt migration when switching between models. Furthermore, a built-in evaluation feedback loop lets users benchmark against up to five models side by side. The default judge model is Claude Sonnet 4.6. Consequently, teams can stop hand-tuning string templates and focus on product work. Sponsor: GoDaddy Economy hosting $6.99/month, WordPress hosting $12.99/month, domains $11.99. Website builder trial available. Use codes at geeknewscentral.com/godaddy to support the show. Arm AGI CPU and Red Hat Go Production-Ready on Agentic AI Arm and Red Hat expanded their collaboration around Arm’s AGI CPU, which is Arm’s branding for its agentic AI chip family. The deal brings Red Hat Enterprise Linux and OpenShift to the chip as a production-ready stack. Hardware specifications include 136 Neoverse V3 cores, 96 PCIe Gen6 lanes, and 12 channels of DDR5-8800 memory in a 300-watt thermal envelope. Availability lands in Q4 through Supermicro, Lenovo, and ASRock Rack. Intel Becomes McLaren Racing’s Official Compute Partner Intel announced a multi-year deal as the official compute partner for McLaren Racing. The agreement covers the McLaren Mastercard Formula 1 team, Arrow McLaren IndyCar, and McLaren F1 Sim Racing. Trackside edge compute will power real-time race decisions, while Xeon and Core Ultra silicon drive Computational Fluid Dynamics and digital twin work. Consequently, design iterations that once took weeks now collapse to days. The deal puts Intel silicon in front of every CTO watching a Grand Prix. Rust Lands 13 Google Summer of Code Projects The Rust Project landed 13 accepted projects in Google Summer of Code 2026. Out of 96 proposals, a 50 percent jump from last year, the project selected 13. Notably, three returning contributors from prior years are back. Mentors flagged a noticeable share of AI-generated submissions as a growing challenge. Furthermore, the real bottleneck remains mentor capacity rather than funding. GitHub Innovation Graph Maps Digital Complexity Researchers used GitHub Innovation Graph data to predict GDP, inequality, and emissions through the Economic Complexity Index, or ECI. Countries are compared to kitchens; the more variety and sophistication in software output, the higher the score. Germany ranks first, followed by Australia and Canada. The U.S. lands at sixth. However, the dataset only captures public GitHub activity, leaving most proprietary software invisible. NASA and Eta Space Prepare Cryogenic Fuel Demo NASA is teaming with Eta Space on an in-orbit demonstration called LOXSAT, short for Liquid Oxygen Flight Demonstration. The nine-month mission tests cryogenic fluid management techniques required for in-space propellant depots. Launch is no earlier than July 17 aboard a Rocket Lab Electron from the Mahia Peninsula in New Zealand. Successful refueling in orbit could reshape what is possible for deep-space missions to the Moon and Mars. Stealth Magma Surge Under São Jorge Surprises Researchers Researchers in the UK and Spain published in Nature Communications on a 2022 magma surge under São Jorge Island in the Azores. The surge climbed from more than 20 kilometers underground to 1.6 kilometers below the surface. Surprisingly, most of the thousands of earthquakes happened after the magma stalled, not during the climb. Consequently, scientists are calling it a stealth surge and a failed eruption. A primed magma chamber now sits closer to the surface than before. End-to-End Encrypted RCS Begins Rolling Out Apple and Google led a cross-industry effort to roll out end-to-end encryption for RCS messaging. As of May 11, the feature is rolling out in beta on both platforms. Importantly, encryption is on by default and auto-applies to new and existing conversations. A lock icon in the chat indicates active end-to-end encryption. This quietly raises baseline privacy for billions of cross-platform messages. Cochrane signs off with the usual ecosystem mentions: GNC Insider at geeknewscentral.com/insider, the show newsletter, and modern podcast app recommendations at podcastapps.com. The post A Reversible Glue that could Replace Solder #1865 appeared first on Geek News Central.
The US war on Iran and resulting war on the rest of the world is a constant theme in this week's headlines. We've got stories from World Cup workers, Spirit Airlines, German and Italian students, DeepMind, the University of California, SAG-AFTRA, Indiana University, Stanford, and Cornell. For our first main story, we discuss a recent Labor Notes piece on the deadly nature of working with engineered stone. Immigrant workers incarcerated at a privately run concentration camp in Michigan have gone on strike for nearly a month in protest of their mistreatment. Finally, we actually have a labor story about someone we never thought we'd discuss on the show: Mr. Beast. Join the discord: discord.gg/tDvmNzX Follow the pod at instagram.com/workstoppage, @WorkStoppagePod on Twitter, John @facebookvillain, and Lina @solidaritybee
Live from The Royal Institution of Great Britain, it's TechStuff! Oz sat down with two visionaries at an event hosted by Quilt.AI. First, he spoke with Ali Eslami, a Distinguished Research Scientist at Google DeepMind, who built the prototype for what is now AI Search. Ali talked about how working on AI can feel like surfing, and what went into connecting Gemini to Google Search to create what he called "neural Google." After that, Oz chats with Saad Mohseni about his work with MOBY Group. Saad guides Oz through his twenty-year effort to bring top-tier news and entertainment to Afghanistan and beyond — from a reality TV singing competition that changed the country, to using WhatsApp and AI to provide education to girls banned from school. Additional Reading: Radio Free Afghanistan – HarperCollins EXCLUSIVE NordVPN Deal ➼ https://nordvpn.com/techstuff Try it risk-free now with a 30-day money-back guarantee See omnystudio.com/listener for privacy information.